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To our families 


Preface 


Linear algebra is everywhere in the world of science and engineering. See [1, 4, 7, 
10, 12, 14-19, 21-23, 25]. The present book is meant as a text for a course on linear 
algebra at the first-year undergraduate level. It is self-contained. The purpose of 
the book is to provide a solid foundation for further study of advanced mathematics. 

At the beginning of the book, we introduce linear systems over the real field, 
solutions of linear systems by Gauss-Jordan elimination, and basic terminology of 
matrix. Especially we study elementary matrices to explain the processes of Gauss- 
Jordan elimination in matrix form. 

We introduce determinant functions in Chapter 2 in order to study Cramer’s 
rule which is an explicit representation for a linear system that has a unique 
solution. We discuss fundamental properties of determinants and the way to evaluate 
determinants through cofactor expansions. 

As a fundamental example of vector spaces, we first introduce the Euclidean 
vector spaces in Chapter 3. We study the Cauchy-Schwarz inequality and linear 
transformations between two Euclidean vector spaces. The most important proper- 
ties of the Euclidean vector spaces will be used to develop the concept of general 
vector spaces later. 

In Chapter 4, we begin with the definition of general vector spaces over the 
real field. We mainly study subspaces, linearly independent sets, and bases for 
vector spaces. As important examples, we discuss four fundamental matrix spaces 
and study their properties. The dimension theorem for subspaces, the dimension 
theorem for matrices, and consistency theorems are also included. 

As a superstructure of vector spaces, we introduce an inner product on general 
vector spaces in Chapter 5. By using the inner product, we can define notions of 
length, distance, angle, and orthogonality in general vector spaces. These notions are 
the foundation of subsequent studies on the Gram-Schmidt process for orthogonal 
bases and least squares problems. Besides, we also discuss the problem of change of 
basis in the last section of this chapter. 


Chapter 6 presents one of the most important topics in linear algebra: eigenvalues 
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and eigenvectors of square matrices. With these concepts and their related theorems, 
we study how to diagonalize a diagonalizable matrix, especially a symmetric matrix. 
Finally, the Jordan decomposition theorem is briefly mentioned. 

In Chapter 7, we introduce general linear transformations between two general 
vector spaces and study their related properties which involve kernel, range, rank, 
nullity, inverse, and so on. We also discuss matrices of general linear transformations 
and show that a general linear transformation between two general vector spaces can 
be regarded as a matrix transformation between two Euclidean vector spaces. 

In the last chapter, we develop several important topics in linear algebra, 
including quadratic forms, complex inner product spaces, Hermitian matrices and 
unitary matrices. A well-known fact in linear algebra is that the matrix product is 


not commutative, i.e., in general, 
XY AYX, 


where X and Y are square matrices. Böttcher and Wenzel proposed the following 
conjecture in 2005: 


IXY -YX|lr < V2||Xllell¥ le, 


where ||- ||~ is the Frobenius norm. In the last part of the book, we give an 
elementary proof of the Béttcher-Wenzel conjecture, where only several classical 
theorems studied in the book are used. 

In writing the present book, many friends have offered us helps, advice, 
comments, and encouragement. First, we would like to express gratitude to the 
following people: Professors Raymond H.F. Chan, Hong-Kun Xu, Jin-Yun Yuan, 
Fu-Zhen Zhang, Zhao-Liang Xu, Chong Li, Dan-Fu Han, Wen Li, Jian-Long Chen, 
Qing-Biao Wu, Man-Chung Yeung, Yi-Min Wei, Che-Man Cheng, Michael K.P. 
Ng, Wai-Ki Ching, Fu-Rong Lin, Hai-Wei Sun, Hao-Min Zhou, Zheng-Jian Bai, 
Jian-Feng Cai, Vai-Kuong Sin, Gang Wu, Matthew M.H. Lin, Jin-Hua Wang, Wei- 
Ping Shen, Seak-Weng Vong, Siu-Long Lei, Kit-Ian Kou, Zhi-Gang Jia, Xiao-Shan 
Chen, Xiao-Fei Peng, Rong Huang, Juan Zhang, Ying-Ying Zhang, Hong-Kui Pang, 
Qing-Jiang Meng, Ze-Jia Xie, and Teng-Teng Yao. Special thanks go to one of 
the greatest mathematicians around the world, Professor Shing-Tung Yau from 
Department of Mathematics, Harvard University, for providing us valuable words 
at the beginning of Chapter 5. Of course, we are particularly grateful to Mr. Xiang 
Zhao for creating the cover painting for the book. Finally, we appreciate the most 
important institution in authors’ life: University of Macau, for supplying such a 
wonderful intellectual atmosphere for writing the book. The book is dedicated to 
the 40th anniversary of University of Macau (1981-2021). 
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Writing of the book is supported by the research grants MYRG2019-00042-FST, 
CPG2021-00035-FST from University of Macau, and the research grant 0014/2019/A 
from FDCT of Macau. 
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Chapter 1 


Linear Systems and Matrices 


“No beginner’s course in mathematics can do without linear algebra.” 


— Lars Garding 


“ Matrices act. They don’t just sit there.” 
— Gilbert Strang 


Solving linear systems (a system of linear equations) is the most important problem 
of linear algebra and possibly of applied mathematics as well. Usually, information 
in a linear system is often arranged into a rectangular array, called a “matrix”. The 
matrix is particularly important in developing computer programs to solve linear 
systems with huge sizes because computers are suitable to manage numerical data 
in arrays. Moreover, matrices are not only a simple tool for solving linear systems 
but also mathematical objects in their own right. In fact, matrix theory has a variety 
of applications in science, engineering, and mathematics. Therefore, we begin our 
study on linear systems and matrices in the first chapter. 


1.1 Introduction to Linear Systems and Matrices 


Let R denote the set of real numbers. We now introduce linear equations, linear 
systems, and matrices. 


1.1.1 Linear equations and linear systems 


We consider 


4121 + Gg%q + +++ + antn = b, 


where a; E€ R (i = 1,2,...,n) are coefficients, x; (i = 1,2,...,n) are variables 
(unknowns), n is a positive integer, and b € R is a constant. An equation of this 
form is called a linear equation, in which all variables occur to the first power. 
When b = 0, the linear equation is called a homogeneous linear equation. A 
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sequence of numbers s1, $2, . . - , Sn is called a solution of the equation if x1 = $1, £2 = 
S2,..., Zn = Sn Such that 


418, + a252 +++: + anSn = b. 
The set of all solutions of the equation is called the solution set of the equation. 
In the book, we always use example(s) to make our points clear. 


Example We consider the following linear equations: 


(a) ety=l1. 


(b) e+y4+z=1. 


It is easy to see that the solution set of (a) is a line in zy-plane and the solution set 
of (b) is a plane in ryz-space. 


We next consider the following m linear equations in n variables: 


Q11%1 T a12%2 eis Qin~n = by 


a21L1 + A22%Q +: + Aantn = b2 


(1.1) 


Am1L1 + Am2%2 + +++ + Amnn = bm, 


where aij E R (i = 1,2,...,m; j = 1,2,...,n) are coefficients, x; (j = 1,2,...,n) 


are variables, and b; € R (i = 1,2,...,m) are constants. A system of linear equations 
in this form is called a linear system. A sequence of numbers s1, $9,..., Sn is called 
a solution of the system if £1 = $1, £2 = S2,...,%p = Sn is a solution of each equation 


in the system. A linear system is said to be consistent if it has at least one solution. 
Otherwise, a linear system is said to be inconsistent if it has no solution. 


Example Consider the following linear system 


Q11 + a2y = by 


a21 £ T a22y = bz. 


The graphs of these equations are lines called lı and l2. We have three possible cases 
of lines lı and lg in wy-plane. See Figure 1.1. 


e When lı and lz are parallel, there is no solution of the system. 


e When lı and lə intersect at only one point, there is exactly one solution of 
the system. 
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e When lı and lə coincide, there are infinitely many solutions of the system. 


No solution One solution Infinitely many solutions 
Figure 1.1 
1.1.2 Matrices 


The term matrix was first introduced by a British mathematician James Sylvester 
in the 19th century. Another British mathematician Arthur Cayley developed basic 
algebraic operations on matrices in the 1850s. Up to now, matrices have become the 
language to know. 


Definition A matriz is a rectangular array of numbers. The numbers in the array 
are called the entries in the matriz. 


Remark The size of a matrix is described in terms of the number of rows and 
columns it contains. Usually, a matrix with m rows and n columns is called an 
m x n matrix. If A is an m x n matrix, then we denote the entry in row i and 
column j of A by the symbol (A);; = aij. Moreover, a matrix with real entries will 
be called a real matrix and the set of all m x n real matrices will be denoted by the 
symbol R™*”. For instance, a matrix A in R'*” can be written as 


a11 a12 Qin 

a21 a22 a2n 
A= ; 

Am1 Am2 one Amn 


where aij € R for any i and j. When compactness of notation is desired, the 
preceding matrix can be written as 


In particular, if A € R!*!, then A= aj, E R. 


We now introduce some important matrices with special sizes. A row matrix 
is a general 1 x n matrix a given by 


a= [a1,a2,... , an] € R!*”. 
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A column matriz is a general m x 1 matrix b given by 
by 
b= |” | erm, 
bm 


A square matriz is an n x n matrix A given by 


Qil Q12 > Qin 
A | T ERER, (1.2) 
ting: na. P< Dien 
The main diagonal of the square matrix A is the set of entries a11,@22,...,@nn in 


(1.2). 


For linear system (1.1), we can write it briefly as the following matrix form 


a11 a2 > üm by 
Q21 Q22 a2n b2 

27 
Am1 Am2 ar Amn bm 


which is called the augmented matrix of (1.1). 


Remark When we construct an augmented matrix associated with a given linear 
system, the unknowns must be written in the same order in each equation and the 
constants must be on the right. 


1.1.3 Elementary row operations 


In order to solve a linear system efficiently, we replace the given system with its 
augmented matrix and then solve the same system by operating on the rows of the 
augmented matrix. There are three elementary row operations on matrices defined 
as follows: 


(1) Interchange two rows. 
(2) Multiply a row by a nonzero number. 
(3) Add a multiple of one row to another row. 


By using elementary row operations, we can always reduce the augmented matrix of 
a given system to a simpler augmented matrix from which the solution of the system 
is evident. See the following example. 
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Example Consider the following system 
z+ y+ z=6 
2x + 4y — 3z = 1 


3x + 2y — 2z = 1. 
The augmented matrix of the system is given by 
1 1 6 
-3 1 
3 2 —2 1 


By using elementary row operations, actually one can transform the augmented 
matrix of the system to a simpler form, 


1 1 1 6 1 0 0 1 
4 -3 1 |— |O 1 0 2 
3 2 —2 1 001 3 


Then from the simpler form, we immediately have 


t=1, y=2, 2<=83, 


which is obviously the solution of the original system. See next section for details. 


1.2 Gauss-Jordan Elimination 


In this section, we develop a method called Gauss-Jordan elimination [1] for 
solving linear systems. In fact, Gauss-Jordan elimination is the most frequently 
used algorithm in scientific computing. 


1.2.1 Reduced row-echelon form 


In the example of Subsection 1.1.3, we solved the given linear system by reducing 
the augmented matrix to 


O = © 


1 0 1 
0 0 2 
0 1 3 


from which the solution of the system was evident. This is an example of a matrix 
that is in reduced row-echelon form. We therefore give the following definition. 


Definition For any matrix in reduced row-echelon form, it must satisfy the follow- 


ing conditions. 
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(i) For the rows that consist entirely of zeros, they are grouped together at the 


bottom of the matrix. The rows that consist entirely of zeros will be called zero 


rows. 


(ii) If a row does not consist entirely of zeros, then the first nonzero number in the 


row is a1. We call this a leading 1. 


(iii) For two successive rows that both contain leading 1’s, the leading 1 in the higher 


row occurs farther to the left than the leading 1 in the lower row. 


(iv) Each column that contains a leading 1 has zeros in all its other entries. 


Remark A matrix having properties (i), (ii), (iii), but not necessarily (iv), is said 


to be in row-echelon form. The following example is in row-echelon form: 


1.2.2 Gauss-Jordan elimination 


OS: O me 
oo K N 


oO COO WwW 


O -NA 
OS Ciuc 


Gauss-Jordan elimination is a standard technique for solving linear systems. Actually 


Gauss-Jordan elimination is a step-by-step elimination procedure which reduces an 


augmented matrix of a given linear system to reduced row-echelon form. Then the 


solution set of the system can be found by just inspection. We illustrate the idea by 


the following example. 


Example We solve the following sy: 
— 32x 

2%, + 62%2 4 

22, +11x2 4 


stem 


6x3 4 


623 4 


The augmented matrix of the system is given by 


+725 = 15 
H4£4 +225 = 28 
H4£4 —9T5 DaD 

7 15 
2 28 
=9 ð 


Now, by using the elementary row operations, we are going to reduce the matrix to 


reduced row-echelon form. 
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Step 1. Interchange the top row with another row, if necessary, to bring a nonzero 
entry to the top of the leftmost column that does not consist entirely of 


Zeros: 
2 6 6 4 2 28 
Interchange 1st row and 2nd row 0 -3 0 0 7 15 
2 11 6 4 -9 5 


Step 2. If the entry that is now at top of the column found in Step 1 is a Æ 0, 
multiply the first row by 1/a in order to introduce the leading 1: 


1 3 -32 1 14 
1/2 x 1st row : 0 -3 0 0 7 15 
2 11 6 4 -9 5 


Step 3. Add suitable multiples of the top row to the rows below so that all entries 
below the leading 1 become zeros: 


1 3 3 2 1 14 
3rd row + (—2) x 1st row 3 0 0 7 15 
0 5 0 0 11 —23 


Step 4. Now cover the top row in the matrix and begin again with Step 1 applied 
to the submatrix remained. Continue in this way until the entire matrix is 
in row-echelon form: 


13 3 2 1 #14 


(—1/3) x 2nd row 7 


Bee Bs ae ae -£ 28 
3 
0 5 0 0 -11 -23 
1332 1 44 
rd row + (— nd row 7 
3rd FB) K 2nd OR 0100 -t ri 
3 
2 
0000 Ê 2 
3 
1332 1 M4 
3/2 x 3rd row 
a e o £ acer eo ae 
3 
0000 1 3 


The entire matrix is now in row-echelon form. To find the reduced row- 
echelon form we need the following additional step. 
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Step 5. Beginning with the last nonzero row and working upward, add suitable 
multiples of each row to the rows above to introduce zeros above the leading 


1’s: 
4 ; 1 3 3 2 1 14 
2nd row + (7/3) x 3rd row 5 01000 2 
000 0 1 3 
i 1 3 3 2 0 Il 
1st row + (—1) x 3rd row ; 01000 2 
000 0 1 3 
; aus 103 2 0 5 
st row + (—3) x 2nd row 3 01000 2 
000 0 1 3 
The last matrix is in reduced row-echelon form. 
The corresponding system is 
Ly + 3x23 + 274 = 5 
T2 =; +2 
as = 3. 


Since £1, £2, and x5 correspond to leading 1’s in reduced row-echelon form of the 
augmented matrix, we call them leading variables. The remaining variables x3 and 
x4 are called free variables. Solving the leading variables yields 


zı = —323 — 2444+ 5 
t= 2 
t= 3. 
Setting z3 = s and x4 = t, we therefore obtain the solution set of the system, 
zı = —3s—2t+5 
£2 = 2 


T5 = 3, 


where s and t can take arbitrary values. 
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1.2.3 Homogeneous linear systems 


A linear system is called to be homogeneous if the constant terms are all zero. 


Consider 
411%, + GyQ%2q +: + inln =0 
a211 + A2222 +`: + Amin =0 
am1T1 + Am2%2 + +++ + amnTn = 0. 
Obviously, £1 = £2 = --- = £n = 0 is a solution of the system, which is called the 


trivial solution. Any nonzero solutions are called nontrivial solutions. For nontrivial 
solutions, we have the following theorem. 


Theorem 1.1 A homogeneous linear system has infinitely many solutions if there 


are more variables than equations. 


Proof Let 


Q11%1 T Q1272 Bae Aim<m ors Ann =0 


a21£1 + a22%2 +`: + Gom@m +`: + GanTn =0 


Amit, + Gm2%2 +++: + aAmmEm +`: + amnEn = 0, 


where m < n. By using elementary row operations, one can obtain the reduced row- 
echelon form of the augmented matrix of the system. It follows from the reduced 
row-echelon form that the corresponding system has the following form 


Tki +5 ()=0 
Tk +5 ()=0 


Tk, +S )=0, 


where ky < ky < +--+ < kr are numbers in the set {1,2,...,m} and 5°>( ) denotes 
sums that involve the n — r free variables. We remark that r < m < n and usually 


10 


kı = 1. If r = m, then there is no zero row. 


which implies that the system has infinitely many solutions. 


ky 


Tko 


Tk 


P 
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Finally, we obtain 


Example We consider the following linear system with 6 variables and 4 equations, 


Q11%1 T Q122 T 41373 7 


+ A444 + a155 + a16x%6 = 0 


az26£6 = 0 


Qa21%1 T A222 


Q31%1 T A32%2 T 43373 7 


a233 


Q24%4 


Q255 


+ a34@4 + a355 + a36£6 = 0 


G41 21 T A4Q%2Q 


A43%3 


The augmented matrix A of the system is 


a11 412 

a21 422 
A = 

a31 432 

a41 Q42 


a13 
a23 
433 
a43 


a14 
a24 
a34 


a44 


a444 


a45£5 + a46£6 = 0. 


a15 aig 0 
a25 ax 0 
& R4 x7 ’ 
a35 aze 0 
a45 ass 0 


By using elementary row operations, if the reduced row-echelon form of A is obtained 


as 
bio 


oO OF 
oOo oO 


then the corresponding system is 


£1 + by2%2 


0 0 0 be O 
1 bog 0 bog 0 
0 0 1 bs 0’ 
0 0 0 0 0 
ini bieze =0 
£3 + b24£4 + b26£6 = 0 


£5 T 63626 = 0. 
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We therefore have 


z1 = —b12%2 — b16£6 
£3 = —b24%4 — b26£6 
T5 = —bz6 £6. 


From the proof of Theorem 1.1, it follows that r = 3, kı = 1, k2 = 3, and k3 = 5. 
There are three free variables, say, £2, £4, and xg. Thus, the system has infinitely 
many solutions. 


1.3 Matrix Operations 


Matrices appear in many contexts other than as augmented matrices for linear 
systems. In this section, we begin our study on matrix theory by giving some basic 
definitions of the subject. We also introduce some operations on matrices and discuss 
their fundamental properties. 


1.3.1 Operations on matrices 


Now, we develop an arithmetic of matrices which contains the sum, difference, 
product of matrices, and so on. We have the following definition of operations on 
matrices. 


Definition 


(i) Equal of matrices: Two matrices A = [a;;| and B = [bij] are said to be equal, 
denoted by A = B, if they have the same size and aij = bij for all i, j. 


(ii) Sum and difference: Let A = [aij] and B = [bij] have the same size. Then 
A+B is a matrix with the entries given by (A + B)ij := aij + bij for all i, j, 
and A — B is a matriz with the entries given by (A — B)ij := aij — bij for all 
i,j. 

(iii) Scalar multiplication: Let A = [aij] and c be any scalar. Then cA is a matriz 
with the entries given by (cA); := caij for all i,j. 

(iv) Linear combination of matrices: 5 cA, where Aq) (1 <i < 5) are matrices 


i=1 
of the same size and ci (1 < i < s) are scalars. 


(v) Matriz product: Let A = [aij] be a general m x r matrix and B = [b;;| be a 
generalr x n matrix. Then the product of A and B is an mxn matrix denoted 
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by 
| a11 a12 Gir ] 

a21 Q22 ‘++ Gar [sr ta 
bir Ore a bij ee bin 

I | 
A b21 b22 * i bo; (eee bon 
B=| 7 E l BOO . 
| Qil ai2 Qir | | 3 | : 

I | 
bri bro il brj ‘io brn 

Am1 Gm2 ` Amr 


with entries (AB),; defined by 


Tr 
(AB); = airbij + aizb2j ++ Qirbrj = 5 Qikbkj 
k=1 


foralll<igmand1l<j<n. 


Remark In the definition of matrix product, the number of columns of the first 
factor A must be the same as the number of rows of the second factor B in order to 
form the product AB. If the condition is not satisfied, then the product is undefined. 
Even if A and B are both n x n matrices, we usually have AB 4 BA. 


Example 1 Consider the matrices 


4 1 0 4 
1 
a= ia ah afosa e 
1 2 1 0 
Since A is a 2 x 3 matrix and B is a 3 x 4 matrix, the product AB is a 2 x 4 matrix 
given by 
AB= 773 4 l 
4 7 2 0 
But the product BA is undefined. 
Example 2 Let 
1 5 4 3 1 2 
A= 2 -3 0j, B=| 0 -1 -2 
0 4 1 1 6 2 


1 5 4 3 1 2 7 20 0 
AB = 2 —3 0 0 -1 -2 |= 6 5 10 
0 4 1 1 6 2 1 2 —6 
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and the product of matrices B and A is given by 


13 


3 1 2 1 5 4 5 20 14 
BA = 0 -1 -2 2 —3 O] = 2 5 2 
1 6 2 0 4 1 13 —5 6 


But ABZ BA. 


1.3.2 Partition of matrices 


A matrix can be partitioned into smaller matrices by inserting horizontal and vertical 


rules between selected rows and columns. For instance, below are three possible 


partitions of a general 4 x 5 matrix A. The first one is a partition of A into four 


submatrices A11, A12, Aoi, and Ag2; the second one is a partition of A into its row 


matrices r1, r2, r3, and r4; the third one is a partition of A into its column matrices 


C1, C2, C3, Ca, and C5: 


l 
Q11 Q12 Q13 |; Q14 415 
l 
E Q21 Q22 Q23 | G24 Q25 An Ar |. 
z3 
l 
431 432 433 | 434 435 Az Axe 
l 
Q41 Q42 Q43 | Q44 Q45 
ay 412 413 G14 Q15 rı 
a2 a22 Q23 G24 Q25 r2 
A= ; 
a31 Q32 433 Q34 435 r3 
G41 Q42 Q43 Q44 Q45 r4 
l l l l 
Q11 | 212 | 413 | Q14 | 415 
l l l l 
G21 | 422 | 423 | 424 | G25 | | | 
A= = C1 C2 C3 C4 
| | | | | | l 
431 | 432 | 433 | 434 | 435 
l l l l 
Q41 | 442 | Q43 | Q44 | Q45 


1.3.3 Matrix product by columns and by rows 


Sometimes it may be desirable to find a particular row or column of a matrix product 


AB without computing the entire product. The following results are useful for that 


purpose. Let A € R™*" and B € R”*”. Then 


T 
X aK Dk; 
k=1 


jth column matrix of AB = : = A| jth column matrix of B | 


z 
X Amkbkj 
k=1 
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and 


ith row matrix of AB = | 5 Qikbk1;---, De Qikbkn | = | ith row matrix of A ]B. 
k=1 k=1 


Remark Let aı,az2,...,am denote the row matrices of A and by, b2,..., bn denote 
the column matrices of B. It follows from the formulas above that 
AB=A| by | bə taal bn |] =| Abi | Abs | +| Abn > (13) 
which shows that AB can be computed column by column, and 
a, a,B 
ag aB 
AB= Wa BS a= |, (1.4) 
am am B 


which shows that AB can also be computed row by row. 


1.3.4 Matrix product of partitioned matrices 


From the remark in Subsection 1.3.3, we know that the computation of a matrix 


product can be completed by some special partitions of matrices. We 
the general case. Let 


Ay, Aj © Ais Bıı Biz 

A21 Ago ++: Ags , Bo, B22 
A=]| . a | IR, Bg i 

Ari Ar pret Ars Bsı Bsz 


now introduce 


By 


Bot 
€ Re, 


Bst 


where the number of columns of submatrix Aig is equal to the number of rows of 


submatrix Bķj for each 1 <igr,l1<k<s,and1<j<t. Then we construct 


Cy C12 NS Cit 

C21 C22 a as Cot 
C=]... an: || ER 

Cri Cro AG Crt 


where each submatrix of C is given by 


Cij = Ain Bij + Ai2Boj +--+ + Ais Bsj = 5 AikBkj 
k=1 
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for 1 <i<rand1<j<t. In fact, the partitioned matrix C is nothing new but 
the nee of eee i and B, i.e., C = AB. See the following example. 


Example Let A € R®*? and B € R°*?. Below are the partitions of A and B: 


Q11 412 413 A A bit | | bie B B 
A= | az a22 | a23 f 12 B= | bai | b22 a 12 
Sa Az A22 TE irae Bo, B22 
431 432 | 433 b31 | b32 
Then 
Ai Ale By, By Ay, By, + Aj2Bo, Ay Big + Ai Boo 
A21 A22 Bo, Bag Ao, By, + Ag2Bo, A21 B12 + A22 B22 
b | b 
a12 11 + a13 bs: | | @i1 Q12 12 Fi a13 [b32 | 
Q22 b21 Q23 i a21 Q22 b22 Q23 


bii 
[ası asa | bz 


i e + a12b21 


I 
Š a13b31 | 441612 + a12b22 
a23b31 | a21b12 + a22b22 


a21b11 + a22b21 


| asıb + a32b21 | + | a33b31 ] | a31b12 + a32b22 + | a33b32 | 


a11b11 + a12b21 + a13b31 @11b12 + a12b22 + a13b32 


a21b11 + a22b21 + a23b31 @21b12 + a22b22 + a23b32 = AB. 
a31b11 + 32021 + a33b31 4@31b12 + a32b22 + a33b032 


1.3.5 Matrix form of a linear system 


In fact, the matrix product has an important application in solving linear systems. 
Consider linear system (1.1) of m linear equations in n unknowns. We can replace 
the m equations in this system with the single matrix equation 


Q111 T a12%2 ngs AlnTn by 


G21%1 T a4292%2 T t T A2nXn b2 


Am1L1 + Am2%2 F +++ + Amnn bm 
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By using the product of matrices, it follows that 


a11 Q12 `° Gin Tı by 
a21 Q22 `° G2n T2 b2 
aml Am2 PE Amn Tn bm 
Let 
Tı by 
a11 Q12 `° Gin 
Q21 Q22 ` an T2 b2 
A= , x= , b= 
a. a. eee a. 
ml m2 mn Ln, bm 


Then the original system has been replaced by the single matrix equation 
Ax =b. 


Here A is called the coefficient matriz of the system. Thus, the augmented matrix 
for the system is obtained by adjoining b to A as the last column, i.e., [ A | b]. 


Remark Note that by using matrix operations on the above linear system, it can 
also be written as follows: 


a11 Q12 Gin by 
a21 Q22 a2n b2 
si| o +æ] asta | o s] o is 
Am1 Am2 Amn bm 
ice., 
LC, + T2C2 + +++ + EnEn = b, (1.5) 


where c; is the jth column matrix of A for 1 S j <n. 


1.3.6 Transpose and trace of a matrix 


Definition The transpose of an mx n matriz A = [aij], denoted by AT, is defined 


to be then x m matrix with entries given by 
(A?) ig = aji. 


The trace of ann x n matrix A = [aij] is given by 


tr(A) := y lii. 
i=1 
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For instance, 


Q11 412 413 Q41 Q21 431 
T 
A= | an az az |, A =] a2 Go2 ago |, tr(A) =a11 + a22 + a33. 
a31 Q32 433 Q13 Q23 433 


Some important properties of the transpose are listed in the following theorem. 


Theorem 1.2 Let the sizes of matrices A and B be such that the stated operations 
can be performed. Then 


) ( 
b) (A+B)? = AT + BT. 
) (kA)? = kAT, where k is any scalar. 

) (AB)? = BTAT. 

Proof Parts (a), (b), and (c) are self-evident. We therefore only prove (d). Let 
A = [aij] E R”, B = [bij] € R’™”. 


Then the products (AB)? and BTAT can both be formed and they have the same 
size. It only remains to show that corresponding entries of (AB)? and B7 A? are 


the same, i.e., for all i, j, 
((AB)*),, = (BTA? )iz. (1.6) 


Applying the definition of transpose of a matrix to the left-hand side of (1.6) and 
then using the definition of matrix product, we obtain 


((AB)’*),, < (AB)ji = 5 Qjkbki- 
k=1 
To evaluate the right-hand side of (1.6), let AT = [a;;] and BT = [b;;], then 
ai; = Aji, bij — bji. 
Furthermore, we have for all 7 and j, 


(BYAT); = 5 Dip Qj = 5 bkiūjk = 5 Qjkbki = ((AB)*),, . 
k=1 k=1 


k=1 


Thus, (d) holds. 


Some important properties of the trace are included in the following theorem. 
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Theorem 1.3 Let A = [a;;] and B = [bij] be n x n matrices. Then 

(a) tr(A) = tr( AT). 

(b) tr(AB) = tr(BA). 

(c) tr(aA + BB) = atr(A) + btr(B), where a and B are any scalars. 

(d) tr(AB — BA) =0. 

(e) tr(B) = 0 if BT = —B. 


Proof Part(a) is obvious. For (b), because addition is associative and commutative, 


we have 
tr(AB) = X(AB)u = X X airbri = X X. brian = X_(BA)kr = tr(BA). 
i=l i=l k=1 k=1 i=1 k=1 


For (c), we have 


n 


tr(aA+ BB) = X (aA + BB)ii = X (a -aii + B+ bii) 


i=l i=l 


= ay) aii + B 5 bii = atr (A) + btr(B). 
i=1 i=1 
For (d), it follows from (c) and (b) that 
tr(AB — BA) = tr(AB) — tr(BA) = tr(AB) — tr(AB) = 0. 
For (e), by using (a), the given condition BT = —B, and (c), we deduce 


tr(B) = tr(BT) = tr(—B) = —tr(B). 


Thus, tr(B) = 0. 


1.4 Rules of Matrix Operations and Inverses 


In this section, we study some basic properties of the arithmetic operations on 
matrices. 
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1.4.1 Basic properties of matrix operations 


Theorem 1.4 Let A, B, and C be matrices and the sizes of matrices be assumed 
such that the indicated operations can be performed. The following rules of matrix 


operations are valid. 


(a) A+ B=B+A. (Commutative law for addition) 
(b) A+(B+C) =(A+B)+C. (Associative law for addition) 
(c) (AB)C = A(BC). (Associative law for product) 
(d) A(B +C) = AB 4 AC. (Left distributive law) 

(e) (B4£C)A= BALCA. (Right distributive law) 


(£) aB4C)=aBtaC. 


(g) (a +b)C =aC £00. 


(h) a(bC) = (ab)C. 
(i) a(BC) = (aB)C = B(aC). 
Here a and b are any scalars. 


Proof We only prove (c) of the associative law for matrix product. The other parts 
here are left as an exercise. Assume that 


A= [ais] E R**”, B= [bj x] E R, C= [cri] ERSE 


We want to show 
(AB)C = A(BC). 


Let 
V = AB= [vix] Ee R”, W = BC = [wz] ER”. 


Then 
n 
Vik = X Qijbjk 
j=l 


forl<i<sandl<k<m, and 


m 
Wil = X bjkCkil 
k=1 
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forl<j<nand1<l<r. Since (AB)C = VC, the entry in row i and column l 
of matrix VC is given as follows: 
m m n m n 
(VO)a = >) vine = © aizdsn | cx = > >. aijbjkCr (1.7) 
1 


k=1 k=1 \j= k=1 j=1 


for 1 <i<sand1<l<r. Since A(BC) = AW, the entry in row į and column l 
of matrix AW is given as follows: 


(AW) ii = 5 QijWjl = 5 Qij (> bueu) = 5 5 QijbjkCkl (1.8) 
j=l j=l k=1 j=l k=1 


for 1 < ¿i < s and 1 < l < r. Because addition is associative and commutative, the 


results in (1.7) and (1.8) should be the same. Hence the proof is completed. 


1.4.2 Identity matrix and zero matrix 


We define the identity matrix and the zero matrix as follows: 


10> 0 00- 0 
ae ae | gece le i m o |a 
TES 099-20 


Remark Throughout the book, we use the symbol J, to denote the n x n identity 
matrix. If there is no confusion, we sometimes use J to denote the identity matrix 
with an appropriate size. Besides, we also use e; to denote the ith column matrix 
of In, i.e., 

e; = [0 ©. 0 1 0°- 0 


ith 


Theorem 1.5 Let the sizes of the matrices be such that the indicated operations 
can be performed. The following rules of matrix operations are valid. 


(a) AI=A, IA=A. 
(b) A+0=0+A=A. 


(c) A0=0, 0A=0. 
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The proof of the theorem is trivial and we therefore omit it. 


Example Let 
A = 


1 -1 
» 82 
—1 4 


Then AB = 0 even if both A and B are nonzero matrices. Thus, if AB = 0 for 
A €R™”*” and B € R”*", perhaps it does not follow that A = 0 or B=0. 


As the following theorem shows, the identity matrix is useful in studying reduced 
row-echelon forms of square matrices. The proof of the theorem is left as an exercise. 


Theorem 1.6 Let R be the reduced row-echelon form of a square matrix A. Then 
either R has a row of zeros or R= I. 


1.4.3 Inverse of a matrix 
Definition Let A and B be square matrices of the same size such that 
AB=BA=I. 


Then B is called an inverse of A, denoted by B = A~', and A is said to be 
invertible. If no such B exists, then A is said to be not invertible. 


Example Consider the matrices 


1 1 
b Be 
al 


A= 


One can verify that B is an inverse of A since 


Cie O 
deden 


The next theorem shows that an invertible matrix has exactly one inverse. 


AB = 


and 


BA = 


Theorem 1.7 Let B and C be both inverses of the matrix A. Then B = C. 
Proof Since B and C are both inverses of A, we have 


AB=BA=I, AC=CA=I. 
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From BA = I, multiplying both sides on the right by C yields 

(BA)C=IC=C. 
However, we obtain by Theorem 1.4 (c), 


(BA)C = B(AC) = BI = B. 


Thus, C = B. 


For a 2 x 2 invertible matrix, the following theorem gives a formula for 
constructing the inverse. 


Theorem 1.8 The matriz 


is invertible if ad — bc # 0. The inverse is given by the formula 


A`! 1 | d ae oS Taal 


c a | 
ad — bc ad — bc 


Proof Verify that AAT! = In and ATIA = Ig. 


The following theorem is concerned with the invertibility of the product of 


invertible matrices. 


Theorem 1.9 Let A and B be n x n invertible matrices. Then AB is invertible 


and 
(AB = BHA. 
In general, 


(AmA Ap) =A Ag 


-1 
m 4@)4q)> (1.9) 


where Aq) (1 <i <p) aren x n invertible matrices and p is any positive integer. 


Proof Since A and B are invertible, we obtain by Theorem 1.4 (c), 
(AB)(B~'A7') = A(BB~')A71 = AA“! =T 


and 
(B-'A~1)(AB) = B7! (ATtA)B = BB =1. 
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Thus, 
(AB `} = BHATE., 


We are now going to prove the general case of (1.9) by using induction. When p = 2, 
we have just proved that (1.9) is true. We assume that (1.9) is true for p = k — 1, 
i.e., 


(Aq) Aq Au») = AGL 


(k-1)' “Av 4a) 


ae (1.10) 
Considering the case of p = k, it follows from the case of p = 2 and (1.10) that 


1 —1 
(Aq) Aca) Ae Ay) = [(Aa)4@- es )] 


= AG (Aa) Ag) --Ageay) 1 = AGA (Ag yy - AGG!) 


14- aT yi 
= Aq pAg- 1) -++ Aly AG). 


(k—-1)" 


Therefore, (1.9) is true for any positive integer p. 


The following theorem gives a relationship between the inverse of an invertible 
matrix A and the inverse of A’. 


Theorem 1.10 IfA is invertible, then AT is also invertible and 
(AT)! = (A727 
Proof We have by using Theorem 1.2 (d), 
AT(A)? = (A™tA)T = IT =I 


and 
(A AY SAA =I" =]. 


Thus, (AT)-1 = (A7). 


1.4.4 Powers of a matrix 
Definition If A is square, then we define the powers of A for any integer n > 0, 
A° := 1] 


3 


A” i= AA- A. 
S 


Moreover, if A is invertible, then 
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If r and s are integers, then it follows from the definition of powers that 
A’ AS = ATH (A7) = ATS, 
Theorem 1.11 If A is invertible, then 
(a) Aq! is invertible and (A~+)~1 = A. 
(b) A” is invertible and (A")~1 = (A~+)” for any integer n > 0. 
(c) For any scalar k £0, the matrix kA is invertible and (kA)~+ = TE 


The proof of the theorem is straightforward and we therefore omit it. 


1.5 Elementary Matrices and a Method for Finding A`! 


We develop an algorithm for finding the inverse of an invertible matrix in this section. 


1.5.1 Elementary matrices and their properties 


Definition An nxn elementary matrix can be obtained by performing a single 
elementary row operation on In. The following are three types of elementary matrices. 


(i) Interchange rows i and j of In: 


1 
row i 


1 at 0 row j 


(ii) Multiply row i of In by c (c #0): 
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c row íi 


Pa 


1 


Remark A square matrix is called a permutation matriz if it can be written as 
a product of elementary matrices of type (i). 


When a matrix A is multiplied on the left by an elementary matrix E, the result is 
to perform an elementary row operation on A. More precisely, we have the following 
theorem. 


Theorem 1.12 Let A be an mx n matrix. If the elementary matrix E results from 
performing a certain row operation on Im, then the product EA is the matrix that 


results when this same row operation is performed on A. 


Proof We only prove the statement concerned with the elementary matrix E(i, 7). 
One can prove the statements concerned with E(i(c)) and E(i, j(k)) easily by using 
the same trick. Let r1, r2,...,rm denote the row matrices of A and e1,€2,...,€m 
denote the column matrices of Im. It follows that 


rı eT 

ri ey 

A= » EGN=] : 
T 

i 


Er e 
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Since ef A =r, for 1 < k < m, we have by (1.4), 


T T 

ej e, A rı 

T T l 

ej ejA rj 

E(i, j) A= A= : = 
T 

e; e; A r; 
T T 

en eA lm 


Hence E(i, j)A is a matrix that results when rows 7 and j of A are interchanged. 


The following theorem is concerned with the invertibility of elementary matrices. 
The proof of the theorem is left as an exercise. 


Theorem 1.13 For three types of elementary matrices, we have 


Eli, j) = Eli, j), Eld) = Eil), — E(ij(k)) "= E(i, j(—k)). 


Remark It follows from Theorem 1.13 that the inverse of any elementary matrix 
is still an elementary matrix. 


1.5.2 Main theorem of invertibility 


The next theorem establishes some equivalent statements of the invertibility of a 
matrix. These results are extremely important and will be used many times later. 


Theorem 1.14 Let A be an n x n matriz. Then the following statements are 


equivalent, i.e., all true or all false. 
(a) A is invertible. 
(b) Ax =0 has only the trivial solution. 
(c) The reduced row-echelon form of A is In. 
(d) A is expressible as a product of elementary matrices. 


Proof It is sufficient to prove that (a) > (b) > (c) > (d) = (a). 


(a) = (b): Let xo be any solution of Ax = 0, i.e., Axo = 0. Since A is invertible, 
multiplying both sides of Axo = 0 by A~!, we have A~! Axo = A~!0, which implies 
InXo = 0. Thus, x9 = 0. Therefore, Ax = 0 has only the trivial solution. 
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(b) = (c): Let R be the reduced row-echelon form of A. Then by Theorem 1.6, 
R = Ín or R has a zero row. If R has a zero row, then it follows from Theorem 
1.1 that Rx = 0 has infinitely many solutions, i.e., nontrivial solutions. Therefore, 
Ax = 0 has nontrivial solutions, which contradicts (b). Thus, R = In. 


(c) = (d): If the reduced row-echelon form of A is In, then there exist some 
elementary matrices Eq), E(2), - - -, Eg) such that 


Eg Eg EmA = h. (1.11) 


By Theorem 1.13, we know that every elementary matrix is invertible and the inverse 
of an elementary matrix is still an elementary matrix. It follows from (1.11) and 
Theorem 1.9 that 


— p-lp-l... pl 
A= Eq) Fy Bay: 


Thus, (d) holds. 


(d) => (a): It is obtained directly from Theorems 1.13 and 1.9. 


1.5.3 A method for finding A`! 


In the following, we establish a method for constructing the inverse of an n x n 
invertible matrix A. Multiplying both sides of (1.11) on the right by A`! yields 


A`? = Ey Ea Eq); 


where Ev); E; 


2),---, E(k) are elementary matrices. We next construct an n x 2n 
matrix | A'I |: We have by using (1.3), 


Eta) + Ba) Bq) | Ail el Buy Eao EA | Bay Ea)£al 
=| I | Buy: Ee Eo) | 
= ae sua 


Thus, the sequence of elementary row operations that reduces A to I actually 
converts I to AT! simultaneously. 


Example Find the inverse of 


NI Fr oO 
wom N 
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Solution The computations are as follows: 


1 0 2 1 0 0 
| ALG Sie ides 3H 6 a -ü 
2 7 3 0 0 1 
2nd row + (—3) x 1st row 1 0 2 1 0 0 
3rd row + (—2) x 1st row 0 1 0 -3 1 0 
0 o s =2 0 1 
: 4 1 0 2 1 0 0 
3rd row + (—7) x 2nd row 0 1 0 3 1 0 
0 0 -1 19 —=7 1 
3 1 0 2 1 0 0 
(—1) x 3rd row 0 1 0 3 1 0 
0 0 1 —19 T -1 
p 1 0 0 39 —14 2 
lst row + (—2) x 3r T 0 1 0 3 1 0 5 | iA A-1 ] 
0 0 1 —19 7T =] 
Thus, 
39 —14 2 
At=| -3 1 0 
—19 7 —1 


1.6 Further Results on Systems and Invertibility 


We develop more results concerned with linear systems and invertibility of matrices 
in this section. 


1.6.1 A basic theorem 


Theorem 1.15 Every linear system has either no solution, exactly one solution, or 


infinitely many solutions. 


Proof If Ax = b is a system of linear equations, then exactly one of the following 
is true: 


(a) the system has no solution; 
(b) the system has exactly one solution; 


(c) the system has more than one solution. 
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The proof will be completed if we can show that the system has infinitely many 
solutions in case (c). Assume that Ax = b has more than one solution, and let 
Xo = Xı — X2, where x; and xg are any two distinct solutions. Therefore, xo is 


nonzero. Moreover, 
AXo = A(x1 — X2) = Ax; — Axo =b -b = 0. 
Let k be any scalar. Then 
A(x, + kxo) = Ax, + A(kxo) = Ax; + k(Axo) =b+kO=b+0=b, 


i.e., X1 + kxg is a solution of Ax = b. Since xo ¥ O and there are infinitely many 


choices for k, we conclude that Ax = b has infinitely many solutions. 


1.6.2 Properties of invertible matrices 


From the definition of the inverse of an invertible matrix A, it is necessary to find a 
square matrix B such that 


AB=I, BA=I. 


The next theorem shows that if B satisfies either condition, then the other condition 
holds automatically. 


Theorem 1.16 Let A be a square matriz. 
(a) If B is square and satisfies BA = I, then B= A7!. 
(b) If B is square and satisfies AB =I, then B= A™!. 


Proof We only prove (a) and the proof of (b) is similar. We consider the system 
Ax = 0 and show that this system only has the trivial solution. Let xo be any 
solution of this system, i.e., Axo = 0. Multiplying both sides by B yields 


BAxo = Bo. 
Since BA = I, we deduce 
Ixo = 0, i.e., Xo = 0. 


Thus, the system Ax = 0 has only the trivial solution. It follows from Theorem 1.14 
that AT! exists. Multiplying both sides of BA = I on the right by A~!, we obtain 


BA-At=1-A7 = B=A!. 
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Theorem1.17 Let A and B be square matrices of the same size. If AB is invertible, 
then A and B must also be invertible. 


Proof Since AB is invertible, there exists (AB)~! such that 
(AB)(AB)"*=I => A[B(AB)"*] =I, 


and 
(AB)"\(AB)=I = [(AB)1A|B=I. 


By Theorem 1.16, both A and B are invertible. 


Example Let A and B be square matrices of the same size. Show that if I — AB 
is invertible, then J — BA is also invertible. 


Proof By Theorem 1.16, we only need to find a matrix X such that (I— BA)X = I. 
Actually, 
I=I-BA+BA=I-BA+BIA 


=I — BA+B(I— AB)(I— AB)-!A 


=I — BA+(B-BAB)(I— AB)-!A 


=I-BA+(I-BA)B(I-—AB)"1A 
=(I — BA)[I+ B(I — AB)“1Al. 
Thus, J — BA is invertible and 


(I — BA)! =1+ B(UI— AB)"1A. 


The following theorem shows that we can solve a certain linear system by using 
the inverse of its coefficient matrix. 


Theorem 1.18 Let A be an invertible n x n matrix. Then for each n x 1 matrix 
b, the system Ax = b has exactly one solution, namely, x = A~'b. 


Proof Since A is invertible, there exists A~!. For each n x 1 matrix b, let xg be 
an arbitrary solution of Ax = b, i.e., Axo = b. Multiplying both sides of Axo = b 
by A71, we have 

AT! Axo = A7'b. 


Thus, xọ = A~'b is the only solution of Ax = b. 


Furthermore, we add two more equivalent statements into Theorem 1.14. 


Theorem 1.19 Let A be ann x n matrix. Then the following are equivalent. 


1.7 Some Special Matrices 31 
(a) A is invertible. 
(b) Ax =0 has only the trivial solution. 
(c) The reduced row-echelon form of A is In. 
(d) A is expressible as a product of elementary matrices. 
(e) Ax = b is consistent for every n x 1 matriz b. 
(f) Ax =b has exactly one solution for every n x 1 matriz b. 


Proof Since we know that (a), (b), (c), and (d) are equivalent, it is sufficient to 
prove that (a) > (f) > (e) > (a). 


(a) = (f): This was proved in Theorem 1.18. 
(£) > (e): This is self-evident. 


(e) = (a): If the system Ax = b is consistent for every n x 1 matrix b, then in 
particular, the systems 


Ax=e,, Ax=e, ..., Ax=e, 
are consistent, where e; denotes the ith column matrix of I, for 1 <i <n. Let 
X1,X2,---,Xn be solutions of the respective systems, and let us form an n x n matrix 
C having these solutions as columns: 

G= |x | X2 |o] ele 
Then we have by (1.3), 


AC=[ Ax. | Axa |---| deta | =[ er fen | | en | =m. 


It follows from Theorem 1.16 (b) that C = A~!. Thus, A is invertible. 


1.7 Some Special Matrices 


Certain classes of matrices have special structures, which are useful in linear algebra 
and also have many applications in practice. 
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1.7.1 Diagonal and triangular matrices 


The following square matrix is called a diagonal matriz: 


dag 0 == Ó 
0 d 0 
D = , ; 
0 0 dnn 


which is usually denoted by 
D= diag(di1, dz2, Pr dnn). 


If di; 40 for 1 < i < n, then 


dj; 0 > 0 
0 dz} >> 0 
D! = è å F tae (dy eds esas) 
0 QO rsi da 
The following square matrices are called triangular matrices: a lower triangular 
matrix 
a11 0 0 0 
a21 az 0 0 
L= | a31 @32 a33 0 (aij =0, i< ĵ) 
Anı n2 An3 `t Ann 


and an upper triangular matrix 


Qil 412 Q13 `° Qin 
0 a22 Q23 `? Aan 

U= 0 0 Q33 `° A3n (aij = 0, i >j). 
0 0 O > ann 


Theorem 1.20 We have 


(a) The transpose of a lower triangular matriz is upper triangular, and the 


transpose of an upper triangular matrix is lower triangular. 


(b) The product of lower triangular matrices is lower triangular, and the product 


of upper triangular matrices is upper triangular. 
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(c) A triangular matrix is invertible if and only if its diagonal entries are all 


nonzero. 


(d) The inverse of an invertible lower triangular matrix is lower triangular, and 


the inverse of an invertible upper triangular matrix is upper triangular. 


Proof Part (a) is obvious. We defer the proof of (c) until the next chapter (after 
Theorem 2.8). Here we prove (b) and (d) only. 


For (b), we will prove the result for lower triangular matrices. The proof for upper 
triangular matrices is similar. Let A = [a;;] and B = [b;;] be lower triangular n x n 
matrices, and let C = AB = [cij]. Obviously, aij = bi; = 0 for i < j. We can prove 
that C is lower triangular by showing that cij = 0 for i < j. If i < j, then the terms 
in the expression of c,; can be grouped as follows: 


Cij = Qirbij + ainda; + +++ + Qi j—1bj—1,j + aijbjj + Gi,5410541,5 Ht + Gindn; - 
—_————— ee a TCT 


Terms in which the row Terms in which the row 
number of b is less than number of a is less than 
the column number of b the column number of a 


In the first grouping all of the b factors are zero since b;; = 0 for i < j, and in the 
second grouping all of the a factors are zero since aj; = 0 for i < j. Thus, cij = 0 
for i < j. It follows that C is a lower triangular matrix. 


For (d), we only prove the result for lower triangular matrices again. Let A = [a,j] 
be an invertible lower triangular n x n matrix, where a;; = 0 for i < j. From (c), 
we know that a;i; 4 0 for all i. Suppose that B = [b,;] is the inverse of A. Then 


AB =I. (1.12) 


We now compare the entries in both sides of (1.12) row by row. Beginning with the 
first row, we have for 7 > 1, 


0 = (L)1j = aibi; + aizb2j + +++ + Ginbny = a11b1; + 0+ baj +--+ +0: bni = a11b15, 


which implies b1; = 0. For the second row, we have for j > 2, 
n 


0= (1); = 5 aztbtj = az1b1j + az2bo; = az2b2; = bo; = 0. 
t=1 


By induction, we suppose that for the top k — 1 rows, bj; =Oifi<jandi<k<n. 
For the kth row, we have for j > k, 


n k 
0= (LD) rej = >D aktbtj = X artbij = ark dk; => bkj =0. 
t=1 t=1 


In particular, bn—1,n = 0. Therefore, B is also a lower triangular matrix. 
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1.7.2 Symmetric matrix 


A square matrix A is called symmetric if A = AT. A square matrix A is called 
skew-symmetric if A = — AT. 


Example Consider the matrices 


eet 2 0 1 0 -1 4 
A= ok B=/|0 5 -2], C=] 1 03 
par 7 ee ae 


Then A and B are symmetric and C is skew-symmetric since 


AT = : = A, ZRI ; pe gal ‘ A E GC 
p —2 7| ee ol 


Theorem 1.21 Let A and B be symmetric matrices of the same size. Then 


(a) AT is symmetric. 


(b) A+B are symmetric. 

(c) kA is symmetric, where k is any scalar. 

(d) AB is symmetric if and only if AB = BA. 

(e) If A is invertible, then A~' is symmetric. 
Proof Parts (a), (b), and (c) are obvious. Here we only prove (d) and (e). 
For (d), since A and B are symmetric, it follows from Theorem 1.2 (d) that 

AB = (AB)! => AB = BTAT => AB = BA. 

For (e), if A is invertible and symmetric, then we have by Theorem 1.10, 


(AS = (AT) = At. 


Thus, A`! is symmetric. 


Theorem 1.22 Let A be an arbitrary matriz. Then AAT and ATA are symmetric. 
Furthermore, if A is square and invertible, then both AAT and ATA are symmetric 


and invertible. 
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Proof It directly follows from Theorem 1.2 (d) and (a) that 
(AAT)? = (AT)TAT = AAT and (A? A)? = AT(AT)? = ATA. 


If A is square and invertible, it follows from Theorem 1.10 that AT is also invertible. 
By Theorem 1.9, we find that AA? and A’ A are invertible. 


Remark Every square matrix can be written as a sum of a symmetric matrix and 
a skew-symmetric matrix. This is one of the most famous results in matrix theory. 
See Exercise 1.35. 


Exercises 


Elementary exercises 


1.1 Determine which equations are linear in variables x, y, and z. If an equation 
is not linear, explain why not. 


(a) 2—ayt ¥5z=0. (b) a? +y24+27=1. 

(c) £t + Ty + z = sin(7/9). (d) 3cosx—4y+z= v3. 
(e) (cos3)x — 4y + z = V3. (f) x= —Txy + 3z. 

(g) zxy+z+1=0. (h) a? +y4+8z2=5. 


1.2 Determine whether each system has a unique solution, infinitely many solutions, 
or no solution. Then try to solve each system to confirm your answer. 


x + 5y = -—1 
r+y=0 
(a) (b) 9 -s+ y=-5 
2r +y=3. 
2x + 4y= 4. 
1.3 Find the augmented and coefficient matrices for each of the following linear 
systems. 
2x — 3y +5=0 5x1 + T2 — z4 + 2z5 = 1 
(a) 4x + 2y —2=0 (b) 3x + 243 — T4 =3 
3x +5z+3=0. 5x1 Ea 325 = 2. 


1.4 Consider 
r+ yt+3z=a 


2r— y+2z=b 


w-—2Qy- z=c. 
Show that if this system is consistent, then the constants a, b, and c must satisfy 


c=b-a. 
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1.5 For which values of a, will the following system have no solution? Exactly one 
solution? Infinitely many solutions? 


x + 2y — 3z = 5 
3m = yF 5z = 1 
4z+ y+ (a? -—14)z=a+2. 


1.6 Determine whether each matrix is in reduced row-echelon form. 


13 5 12 3 
0130 0 0 1 -1 100 
. ip l 
E oa a >) }o 00 1 ROE iea 
00 0 001 


1.7 Solve the following systems by Gauss-Jordan elimination. 
2e—- y+2z=10 


z—2y+ z= 8 


382 - y+2z=11. 
5x — 2y + 6z=0 


(a) | 
(b) 
—2xr + yt3z=1. 
£1 + 2£2 + 3434+ 404 = —3 
zı + 2x2 — 5r4= 1 
(o) | ' 
(d) | 


221 he 4x as 323 T. 1924 = 6 


321 Tr 6x2 = 323 = 2424 = i: 
2x1 + 4x9 + 824+ 6x5 + 18%, — 16 = 0 


zti T 229 = 2X3 ale 3X5, = 


523 T 1024 ae 152z6 — 5= 


—2gzı — 4x2 + 5z3 + 2z4 — 6454+ 3zş— 1=0. 


A= 


0 1 1 0 
B= — 
aie Slag @ 


1 4 
(a) Is M = | on | a linear combination of A, B, and C? 


1 2 
(b) Is N = | yaa linear combination of A, B, and C? 


Exercises 37 


1.9 Let A € R**>, Be R**, C € R°*”, D e R**?, and E € R°**. Determine 
which of the following matrix operations can be performed. If so, find the size of 
each resulting matrix. 


(a) E(A+B). (b) AE+BC+D. (c) B(EA). 
1.10 Find AB and BA, where 
A=[1 3 Bil p=- 
1 


1.11 Let A, B, and C be square matrices of the same size. Find an example to 
show that AB = AC but BAC. 


1.12 Compute A = (6E) (5). where 


1.13 Let 


C= 


1 2 
+ ot a5 5 6 1 3 
ets E S 

1 3 


Using as few computations as possible, compute 
(a) The second row of DE. 
(b) The third column of DE. 


(c) The entry in row 2 and column 3 of C(DE). 


1.14 In each part, compute the product of A and B by the method of product 
of partitioned matrices in Subsection 1.3.4. Then check your results by multiplying 
AB directly. 


-1 2!1 5 AE: 

(a) A= 0 -3!4 2], B= l 
1 516 1 ier ee 
0 381-3 
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2 1! 4 
-1 2 1!5 
-3 5! 2 
(b) A= gee , B= T set 5 
1 5 6/1 Pont aie oe 
l 0 34-3 
2 —5 
1 3 2 -1 3 -4 
A= = 
(©) On 5f? FF 15 7 
1 4 
1.15 Let A and B be partitioned as follows: 
bı 
b2 
a= fa a keia h BS | 
Dn 
where a; (1 < i < n) are column matrices of A and b; (1 <i < n) are row matrices 


of B. Then AB can be expressed as 
AB = abı + agbe +:--+ay,bp. (1.13) 


Based on (1.13), compute AB if 


1: 279 cane 
A= ; B= 1 2 
0 -1 1 j 


1.16 Let A= [a;;] € R”*”. Find two nx 1 matrices x and y such that x? Ay = aij, 
where 1 <i<nandl<j<n. 


1.17 Let 
3 —2 7 6 —2 4 
A=|6 5 4], B=] 0 1 3 
0 4 9 7 7 5 

Compute 


(a) tr(3A — 5B"). (b) tr(A?). (c) tr(AB). 
1.18 Show that tr( AAT) = 0 if and only if A= 0. 
1.19 What is MT for the partitioned matrix 


A B 
C D 


9 


M= 
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1.20 Prove Theorem 1.4 except (c). 
1.21 Prove Theorem 1.6. 


1.22 Use Theorem 1.8 to compute the inverses of the following matrices. 


3 1 cos sin 
A= . b) B= ; 
= 5 J (b) — sin oe 
101l” 
1.23 Find | 0 1 0 | , where n is any positive integer. 
0 0 1 


1.24 Let A be a square matrix. Show that if A? = A and A is invertible, then 
A=]. 


1.25 Let A be a square matrix. Show that if A+ = 0, then 
(I-A =I+A+2 +A. 


1.26 Let A be a square matrix. Show that if A? — 3A + 4I = 0, then A + J is 
invertible. Find (A + I)~?. 


1.27 Let A, B € R”*”. Show that if A is invertible and AB = 0, then B = 0. 
1.28 Show that PT = P-t for any permutation matrix P. 


1.29 Let A be a square matrix and partitioned as 


Ait Aj 
Agi A22 


A= 


kl 


where A11 and Ag are square matrices. Find a permutation matrix P such that 
PT AP = B, where 

Ag. Az 
Ay An 


B= 


1.30 Prove Theorem 1.13. 


1.31 Express each of the following matrices as a product of elementary matrices. 


1 0 
(a) E al (b) 


e O O O 
oor oo 
oN Oo oO 
D (Oink 
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1.32 Find the inverses of the following matrices. 


api 1 2 3 4 
1 3 1 3 
(a) -1 2 3]. (b) 
0 1 0 2 
1 3 1 
I SDT 
1.33 Find a matrix A such that 
2 = 
5 ya 4 6 
1 3 3 1 


1.34 Let A,B € R"*”. Show that tr(AB) = 0 if A is symmetric and B is skew- 
symmetric. 


1.35 Let A € R”*”". Show that A can be written as A = H + K, where H isa 
symmetric matrix and K is a skew-symmetric matrix. 


Challenge exercises 


1.36 Determine the value of À such that the following linear system has only the 
trivial solution. 


AL T2 T3 = 0 


os as na Ax2 ape 23 = 0 


t+ t2+2%3=0. 
1.37 Let A,B € R”"*”. If AB = 0, show that for any positive integer k, 
tr[((A + B)*] = tr(A*) + tr(B*). 
1.38 Show that if A, B,C, D € R”*” such that ABCD = I, then 
ABCD = DABC = CDAB = BCDA =I. 


1.39 Find the inverses of the following matrices. 


w ' 1 11 
eee 01 11 
10 11 
(a) | 1 2 3 3 (b) 
11 01 

12 3 n 
11 10 


Exercises 41 


1.40 Find the inverse of the 3 x 3 Vandermonde matrix 


1 1 
b cl, 
yey Vee 


where a, b, and c are distinct scalars from each other. 


1.41 Let A,B,C,X,Y,Z € R”*”. If A7! and O7! exist, find 


A B 


@® lo e¢ 


I X Y 
(b) 0 IZ 

0 0 IT 
1.42 Let A, B € R”*”. Show that if A + B is invertible, then 


A(A + B)1B = B(A + B)"1A. 


1.43 Let A € R”*”. Show that if AB = BA for all B € R”*”, then A = cI, where 
c is a scalar. 


1.44 Let A be a skew-symmetric matrix. Show that 


(a) I — A is invertible. 

(b) (T-A) +A) = (T+ AU- A)! 

(c) MTM =I, where M = (I — A)~1(I + A). 
(d) I+ M is invertible. 


1.45 Let A, B € R”*”. Show that if A? = 2I and B = A? — 2A + 3I, then B is 
invertible. Find B7. 


Chapter 2 


Determinants 


“The purpose of computation is insight, not numbers.” 


— Richard Hamming 


In this chapter, we introduce the determinant of any square matrix, which actually 
is a function f defined on R"*” in the sense that it associates a number f(A) € R 
with any A € R”*”. We then study some fundamental properties of determinant 
functions and discuss their applications to linear systems and matrices. 


2.1 Determinant Function 
We begin with the following definitions before we introduce the determinant function. 


2.1.1 Permutation, inversion, and elementary product 


Definition A permutation of the set {1,2,...,n}, denoted by (j1,J2,---5jn), ts 
an arrangement of {1,2,...,n} in some order without omissions or repetitions. An 
inversion is said to occur in a permutation (j1,J2,---;Jjn) whenever a larger integer 


precedes a smaller one. 


Remark A permutation is called even if the total number of inversions is an even 
integer and is called odd if the total number of inversions is an odd integer. For 
instance, the number of inversions in (2,4,3,1) is 4 and therefore it is an even 
permutation. The number of inversions in (4,2,3,1) is 5 and therefore it is an odd 
permutation. 


Definition An elementary product from an n x n matrix A = [aij] means any 
product of n entries from A, no two of which come from the same row or column, 
l.e., 


Qi j1 Qizja ``’ Vingn> 


where (i1, i2,...,in) and (j1,J2,---;Jn) are permutations of the set {1,2,...,n}. A 
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signed elementary product from A is defined by 


Se aS aa gs Sa (2.1) 
where T(i1,i2,...,in) and T(j1,j2,---;Jn) denote the number of inversions in 
(41, i2,...,in) and (j1,J2,---;Jn), respectively. 


In fact, an n x n matrix A has n-(n—1)---2-1 =n! elementary products. Note 
that if the positions of any two elements in Qi jıQizja''* Qinjn are exchanged, the 
sign in front of (2.1) keeps unchanged. For instance, consider the following signed 
elementary product 


4) Tt stp stppiytin)AT G1 Jp dptisdInig. . .eeq.s ge ere 
( 1) Bee pce É Qin jy Qipjplip+ijp+1 Qingn* (2.2) 


If the positions of ai j, and a;,,,j,,, are exchanged, then 


4) tpt stp nin) tT (ji Ip jp ojn). . .eeqe. het Sarah 
( 1) = f bs ý = Qin jy Qing ipt1 Vipjp Qinjn 


(VT E N e e OE S E T D E = E REINS ot 
= ( 1) PRR i SPIS i Qin jy Qing tip+1 Vipip Qinjn 


= (—1) Cr eoietptin in) ATG de dette Ing, sane Qing riper Vipip See linjn: (2.3) 
Comparing (2.3) with (2.2) shows that they are equal. Thus, we can rearrange 
the order of di, jı Qizja +- - Qinja in (2.1) such that the permutation of row indexes is 
(1,2,...,n). It follows that (2.1) can be rewritten as 


(1) 72s) +7 Gr Jan) (=1)7 0i 2 ofn) 


a Gage Ung h = alj Gag °°" Ang! - 
For simplicity, later we usually use 
(—1)7 132 


Inj. eres, ae 
"171 4252 Anjn- 


Example We list all signed elementary products from the matrix 


g Signed 
Elementary Associated 
: Even or Odd Elementary 
Product Permutation 
Product 
11422433 (1,2,3 even 11422433 


411423432 (1,3,2 odd —411423032 


412423431 (2,3,1 even 412423431 


413421432 (3, 1,2 


) 
) 

12021433 (2, 1,3) odd —a12421433 
) 
) even 413421432 
) 


13022431 (3, 2,1 odd 013022431 
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2.1.2 Definition of determinant function 


We are now in a position to define the determinant function. 


Definition Let A be a square matrix. The determinant function is defined to be 
the sum of all signed elementary products from A. This function (number), denoted 
by det(A), is called the determinant of A usually. 


More precisely, let A = [a;;] € R"*”, we have 


det(A) = SO(n IN ay 5 Ajo tt Anjn = XO taij, 425, t Anjan: (2.4) 


Here J` indicates that the terms are summed over all permutations (j1, j2,- --, jn): 


Example We obtain 


d aii Q12 | 
et = 441422 — Q12021 
Q21 Q22 


and 
Q11 412 413 


det | a21 a22 a23 | = 4110422033 + 12023031 + 413421432 
431 432 433 
— 411423432 — 412421433 — 413422431. 


2.2 Evaluation of Determinants 


In this section, we show that determinants can be evaluated by using row (or column) 
reduction. 


2.2.1 Elementary theorems 

Theorem 2.1 Let A = [a;;] be an n x n matriz. Then 
(a) det(A) = 0 if A has a zero row (or column). 
(b) det(A) = det( AT). 
(c) det(A) = a11022 `` -ann if A is a triangular matriz. 


Proof We only need to prove (b). The proofs of (a) and (c) are left as an exercise. 
For a general term in det(A), we have by (2.1), 


AT asta ta AT 2s) ae ai 
( 1) Qiiji Qin jo Qinjn' 
It can be written as 


Pm oh -I 
Y J23 
(—1) (ido In) ey ji azy; -++ Anji, 
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and also as 
(=1)7 Gotan) 


Qi 142° °° Ginn: 
Thus, 
= T(iri2,-nin) +T (J1: J2;- jn 
det A) = S G t E an a i a 
-7 


TH REN 
= X (<1) 4142 In) ay ji ajg -> nj! 


T (44 ih yee i 
= So(-1) (ita ain Qing "Gil n 


= det( AT). 


Remark Part (c) in Theorem 2.1 shows that it is easy for us to evaluate the 
determinant of a triangular matrix regardless of its size. A method proposed later 
for evaluating determinants is to reduce a given matrix to be a triangular matrix. 


Since det(A) = det(AT), nearly every statement about determinants that 
contains rows is also true when rows are replaced by columns. 


Theorem 2.2 Let A be an n x n matriz. 


(a) If B is resulted when a single row (or column) of A is multiplied by any scalar 
k, then 
det(B) = k - det(A). 


(b) If B is resulted when two rows (or columns) of A are interchanged, then 


det(B) = —det(A). 


(c) If A has two same rows (or columns), then det(A) = 0. 
(d) If A has two proportional rows (or columns), then det(A) = 0. 


(e) If B is the matrix that results when a multiple of one row (or column) of A is 
added to another row (or column), then det(B) = det(A). 


Proof The proofs of (a), (c), (d), and (e) are left as an exercise. Here we only prove 
(b). For simplicity, we first assume that B is resulted when the first row of A is 
interchanged with the second row of A. Then 


det(B) = SOS) bii, an Bass “Onin 
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= XO (a1 eas, Orin Asis Onin 


= XO (a1) 2) ans azi azis -+e Anin 


TOIA fess; in)tl 
= X (—1)7 tte 1G doi, A3ig °° Anin 
atte 1)7 0132, es vas. do;.0 re 
171 2524373 ` Njn 
= —det(A). 


Similarly, one can prove that (b) still holds if B is resulted from interchanging any 


other two rows of A. 


2.2.2 A method for evaluating determinants 


Based on Theorems 2.1 and 2.2, the row (or column) reduction actually gives us 
a method to evaluate determinants by reducing the given matrix to a triangular 
matrix which can be computed easily. Here is an example. 


Example Evaluate the determinant of 


d0 e OE ao a ane SA 
QUU oN 
Doa NO 
ONG 


1 2 0 3 1 2 0 3 
0 0 2 2 0 0 2 2 
3 6 0 5 0 0 0 -4 


2.3 Properties of Determinants 


We develop some essential properties of determinants in this section. We will show 
that if A and B are square matrices of the same size, then 


det( AB) = det(A)det(B). 


Specially, a determinant test for the invertibility of a matrix is given. 
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2.3.1 Basic properties 


Let A and B be n x n matrices and k € R. We consider possible relationships 
between det(A), det(B), and 


det(kA), det(A + B), det( AB). 


Theorem 2.3 Let A = [a;;] be an n x n matrix and k be any scalar. We have 
det(kA) = k” - det(A). 
Proof By noting that kA = [ka,,], it follows from (2.4) that 


det(kA) = 5 tkay;, ka2j sae kanj, Sks 5 Eiji 9272 °° Anj, = k”. det(A). 


Usually, det(A + B) # det(A) + det(B). For instance, if A= B = Ig, then 


10 1 0 1 0 
4 = det d 
afla naels 


However, we have the following theorem. 


0 
0 


+ + det =1+1=2. 


Theorem 2.4 Let A, B, andC be nxn matrices that differ only in a single row, say 
the rth row. Assume that the rth row of C can be obtained by adding corresponding 
entries in the rth rows of A and B. Then 


det(C) = det (A) + det(B). 
The same result holds for columns. 
Proof Let A= [a;;], B = [bij], and C = [cj]. We assume that 
C= a= by Wis r; 
Cry = Org +b,; ift=r. 


It follows from (2.4) that 


det(C) = ss C15, CQjo °° * Crjr * Cnjn 


= C14, C2jo cons (arj, + brin) EN Cnjn 


Ep EC1 5, C2ja °° * Arje +" Cnn T X EC] j, C2ja +++ brin t Engin 


= J Eaj A2jo °° Arj, °° Anjn + > £15, 625, G brj, i Onn 


= det(A) + det(B). 
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Remark By using Theorem 2.4 and Theorem 2.2 (d), one can prove Theorem 2.2 
(e) easily. 


2.3.2 Determinant of a matrix product 
Let A and B be square matrices of the same size. We are now going to show that 


det(AB) = det(A)det(B). 


Lemma 2.1 Let k be any scalar. For three types of elementary matrices, we have 


Proof It follows from Theorem 2.1 (c) that (a) and (c) are true. 


For (b), we first note that det(Z) = 1 by Theorem 2.1 (c) again. It follows from 
Theorem 2.2 (b) that 


det(E(i,j)) = —det(I) = —1. 


Lemma 2.2 Let B be ann xn matrix and E be ann x n elementary matrix. Then 
det(EB) = det(£)det(B). 


Proof We consider three types of elementary matrices E(i(k)), E(i,j), and 
E(i, j(k)). If E = E(i(k)), then by Theorem 1.12, E(i(k)) B results from multiplying 
the ith row of B by k. It follows from Theorem 2.2 (a) that 


det (E(i(k))B) = k - det(B). 
But from Lemma 2.1 (a), we have det(E(i(k))) = k. Thus, 


det (H(i(k))B) = det (E(i(k)))det(B). 


The proofs of the other two cases are similar to that of E(i(k)). 


Remark It follows by repeated applications of Lemma 2.2 that if Bis ann xn 
matrix and Fy), E), .--, Egr) are n x n elementary matrices, then 


det (Er) rae Ew) Eq)B) = det (Ec) --- det (Eo )det(Ea))det(B). (2.5) 


The next theorem gives a determinant test for the invertibility of a matrix. 


2.3 Properties of Determinants 49 


Theorem 2.5 A square matrix A is invertible if and only if det(A) £ 0. 


Proof Let Eq), E(2),---,;£(r) be the elementary matrices that correspond to the 
elementary row operations that produce the reduced row-echelon form R of A, i.e., 


R= Ee) clay Baya: 
We deduce by (2.5), 
det(R) = det(£(,)) -- - det (E2)) det (Eq) )det(A). (2.6) 


But from Lemma 2.1 the determinants of elementary matrices are all nonzero. It 
follows from (2.6) that det(A) # 0 if and only if det(R) 4 0. 


If A is invertible, then by Theorem 1.19 we have R = I, so det(R) = 1 Æ 0 and 
consequently det(A) 4 0. Conversely, if det(A) 4 0, then det(R) 4 0, so R cannot 
have a row of zeros. It follows from Theorem 1.6 that R = I and therefore A is 


invertible by Theorem 1.19 again. 


Theorem 2.6 Let A and B be square matrices of the same size. Then 
det(AB) = det(A)det(B). 
Proof Let R be the reduced row-echelon form of A. Then 
R= Eq) E@-1) ++: E@EqA, (2.7) 


where Eq), Fya),---, E(—1), Air) are the elementary matrices that correspond to the 
elementary row operations that produce R from A. It follows from Theorem 1.6 that 
R has a zero row or R = I. We have by (2.7), 


al 1 
A= Ep EQ) Eg- EP 


-1 
where Eni, E3 Ep1) 


D (ay a Sh EG are still elementary matrices. Moreover, 


ERE; 


1 Des 1 
AB = EE Bey Bo 


(1) (2) 


where either RB has a zero row if R has a zero row or RB = Bif R= I. 


If R has a zero row, then det(A) = 0 and also det(RB) =0. Thus, we obtain by 
(2.5), 


det(AB) = det (EG) E Egy EG RB) 
= det (EG}) det (Epy) --- det (E7 ,)det (Ez) ) det(RB) 


= 0 = det(A)det(B). 


50 Chapter 2 Determinants 


If R = I, then we deduce by (2.5) again, 


det(AB) = det (EG) EG) Egy EG) B) 


= det (E a) det (E ma) -det (E7 1) det (E75) det(B) 


= = det (EG Ea" Ez 1) Ez) )det(B ) = det(A)det(B). 


Remark The proof of Theorem 1.17 can be given easily by using Theorems 2.5 
and 2.6. 


Theorem 2.7 IfA is invertible, then 

1 
det (A) ` 
Proof Since A~'A = I, it follows from Theorem 2.6 that 


det(A~') = 


det(A~')det(A) = det(A~1 A) = det(Z) = 1. 


Thus, the result holds. 


2.3.3 Summary 


We conclude this section by the following theorem that relates all of the major topics 
we have studied so far. 


Theorem 2.8 Let A be ann xn matriz. Then the following are equivalent. 
(a 
(b 
(c 


) A is invertible. 
) A 
) 
(d) A is expressible as a product of elementary matrices. 
) 
) 
) 


x = 0 has only the trivial solution. 


The reduced row-echelon form of A is In. 


(e) Ax = b is consistent for every n x 1 matriz b. 


(£) Ax =b has exactly one solution for every n x 1 matriz b. 


(g) det(A) 4 0. 


Remark We now prove Theorem 1.20 (c). Let A = [a;;] € R"*” be a triangular 
matrix with diagonal entries a11, @22,...,@nn- From Theorem 2.1 (c) and Theorem 
2.8, the matrix A is invertible if and only if 


det(A) = 411422°°*'aAnn # 0, 


which is true if and only if the diagonal entries are all nonzero. 
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2.4 Cofactor Expansions and Cramer’s Rule 


We introduce a method for evaluating determinants which is useful from a theoretical 
viewpoint. As a consequence of the method here, we obtain a formula in terms of 
determinants for the solution to a certain linear system with a square coefficient 
matrix. 


2.4.1 Cofactors 


Definition Let A = [a;;] be a square matrix. Then the minor of entry aij, denoted 
by Mij, is defined to be the determinant of the submatrix that remains after the ith 
row and jth column are deleted from A. The number 


Cij := (-1)'*9 Mi; 
is called the cofactor of aij. 


Example Consider the following matrix 


431 432 433 


Then the cofactors C11 and C12 are given as follows: 


a a 
= 141 = 22 23 an 
Cy, = (—1) 7 My = det = 022433 — 423432 
a32 433 
and 


a a 

1+2 21 23 

Cie = (—1) + Miz = —det | a å | = 423431 — 421433. 
31 33 


2.4.2 Cofactor expansions 


Now, we introduce the method of cofactor expansions for evaluating determinants. 


Theorem 2.9 The determinant of an n x n matrix A = [a;;] can be evaluated by 
multiplying the entries in any row (or column) by their cofactors and adding the 


resulting products. More precisely, for each 1 Si Sn and1 Sj <n, 
det(A) = aC + ai20i2 +- + AinCin 
(cofactor expansion along the ith row) 
and 
det(A) = a1;C1; + a2;C2; +--+ + anjCnj- 


(cofactor expansion along the jth column) 
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We omit the proof of Theorem 2.9. For readers who are interested in the proof, we 
refer to [19, pp. 72-76] and [10, pp. 236-237]. 


Example 1 Let 


2 1 0 
A=] -1 -3 2 
4 3 -1 


Evaluate det(A) by cofactor expansion along the first row: 


=§ 5 =j at = 
det(A) = 2 - det —1-det + 0- det 
3-1 4 4 3 


= 2x (-3) -1x (-7)+0=1. 
This agrees with the result obtained directly by using the definition of determinants. 


Example 2 Let 


0 2 0- 0 

0 0 3 ::- 0 
aslga T Say S| ees 

00 0 > n 

1 0 0 0 


where n > 3. One can easily obtain 
det(A) = (—1)”*!n! 
by using Theorem 2.9 (cofactor expansion along the first column). 


Example 3 Let 


zx y 0 0 0 
O xv y 0 0 
0 0 a --- 0 0 
A= n x r R N , Ee R"*”, 
0 0 0 ::- x y 
y 00+. 0 @ 


where n > 3. It follows from Theorem 2.9 again (cofactor expansion along the first 
column) that 


det(A) = z” + (-1)"t"y”. 
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2.4.3 Adjoint of a matrix 
Let A = [a,;] E€ R"*” and Cj; be the cofactor of aij. We first show that for p Æ q, 
Ap1Cq1 + Ap2Cq2 + +++ + ApnCgn = 0. (2.8) 


We now construct a new matrix B = [b;;], in which all the rows of B are the same 
as those in A except the qth row which is replaced by the pth row in A (p Æ q), i.e., 


bi big big se bin ] pen a12 Q13 è: a | 

bpı bp2 bps + Opn | row p Gp1 p2 QAp3 `’ Apm | rowp 
bat bg2 bgs ban row q apl Ap2 Ap3 `° Apn row q 
bni bno bn3 PES Ban Ani an2 an3 oe ann 


Therefore, the pth and qth rows of B are the same. Let or be the cofactor of b;;. 
In fact, for this fixed q, we have Cy; = Cg; for 1 < j < n. It follows from Theorem 
2.2 (c) and Theorem 2.9 (cofactor expansion along the qth row) that 


0 = det(B) = bgiCqi + bg2C ga + +++ + bgnCan = ap1Cq1 + Op2C go + +++ + pnCyn- 


Thus, (2.8) holds. We have by Theorem 2.9 again, 


api Ca + ap2Cq2 +> + Qn Con = Ôpqdet(A), (2.9) 
where 
1 ifp=q; 
pq = 
0 ifpA~q. 


Also we have 
A1pCiq + a2pC2q +`: + AnpCng = Spqdet( A). 
We are now in the position to develop a formula for the inverse of an invertible 


matrix. 


Definition Let A = [a,j] be ann x n matrix and Cj; be the cofactor of aij. Then 
the adjoint of A is defined by 
Cy, Cor oss Crm 
Ci? C22 +++ Che 
adj(A) := 


Cin Con Fee Chn 
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Theorem 2.10 Let A be an invertible matrix. Then 


PAE 


= dea 


Proof Let A = [a;;| be an n x n invertible matrix. First, we show that 
A adj(A) = det(A)I. 


We have by using (2.9), 


Q11 Qin Ci Cn 
a21 An Ci2 Cr 
A adj(A) = : 

An1 Ann Cin Cnn 

det(A) 0 0 
0 det(A) 0 
0 0 det(A) 

= det(A)I. (2.10) 


Since A is invertible, det(A) 4 0. Therefore, (2.10) can be rewritten as 


ala] = 1 


Thus, it follows from Theorem 1.16 (b) that 


A`! adj( A). 


1 
— det(A) 
2.4.4 Cramer’s rule 


Cramer’s rule provides a formula for representing the solution of a certain linear 
system. 


Theorem 2.11 (Cramer’s Rule) Let Ax = b be a system of n linear equations in 
n unknowns such that det(A) #0. Then the system has a unique solution which is 
given by 
TARI det (A(1)) foes det (A(2)) Pe det (A(n)) 
Pr Sena, eeu. U oe era 
where Aj) is the matrix obtained by replacing the jth column of A by the n x 1 


matrix 


b= [b1, b2, wise bn)”. 


Exercises 55 


Proof Let A = [aij] be an n x n matrix and Cj; be the cofactor of a;;. By using 
Theorems 2.5, 1.18, and 2.10, we know that the unique solution of Ax = b is given 
by 


1 
=A 'b= dj(A)b 
x TA j(A)b, 
i.e., 
X Cab: 
i=1 
Ty Cir Cor s+ Cm by R 
T2 1 Cie Co C, 2 bo 1 T, Cizbi 
= 7 — i=1 
: det(A) |: : : ; det(A) 
Tn Cin Con E Cnn bn 
Xo Cinbi 
i=1 
(2.11) 
Construct 
Q11 t'>? Q1 j—1 by Q1j4+1 `? Gin 
Q21 ‘`° G2,5-1 by G2j+1 `° Gan f 
Ag) = i o 1<j<n 
an Ss An j—1 bn Qn j+1 to Onn 
We have by using cofactor expansion of det (Avy) along the jth column, 
det (Ag) = X Cyb 1<j<n. (2.12) 
i=1 


Thus, substituting (2.12) into (2.11) yields 


Exercises 


Elementary exercises 


2.1 Find the number of inversions in each of the following permutations. 


(a) (3,2,4,1). (b) (2,4,1,3). (c) (4,1,3,5, 2). 
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2.2 In the determinant of a matrix A = [a;;] € IR°%°, what are the signs of the 
terms 423431442456414965 and 432043414451 466425, respectively? 

2.3 Prove Theorem 2.1 (a) and (c). 


2.4 Prove Theorem 2.2 except (b). 


2.5 Let 
b b b b b 
0 1 1 1 1 
00 1- 1 1 
A= a 0 a a «| ER 
000- 1 1 
000 -> 0 1 


where b Æ 0. Find det(A) and A71. 


2.6 Evaluate the determinants of the following matrices. 


0 0 --- 0 1 1 3 3 3 
0 0 2 0 3 2 3 3 
(a) (b) 333 3 
0 n-1l 0 0 
n 0 0 3.3 3 n 
| xı +1 v4 Ly ] | £z y y ] 
T2 z2 +1 T2 yY T y 
(c) - (d) 
oy Li = Bn $1 YoYo: «& 
tı — Yı %1— Y2 Tı — Yn 
T2 — Yı T2— Y2 T2 — Yn 
(e) 
In — Yı Ln — Y2 ais In — Yn 
2.7 Show that 
b+c c+a a+b a b oe 
det | bi te, cy ta, a+b | =2-det| ay b CG 


bə + CQ C2 + a2 ag +r bə ag bə C2 
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2.8 Let u and v be n x 1 matrices and B be an n x n matrix. Show that 


B —Bu 


det 
e -v'B vľBu 


=0. 


2.9 Let A be a matrix defined by A = aTa, where a = [2,0,—1]. If k is a positive 
integer, find 
det ((2 — A)*), 


where I is the 3 x 3 identity matrix. 


2.10 Let 
a b c 
A=|/|d e f 
g i 
If det(A) = —7, find 
g d 
(a) det(A?). (b) det ((2A)~'). (c) det| 6 h e 
c i 
2.11 Let 
a11 Q12 Q13 ai. bta b7?aig 
A= | azn az az |, C= bazı a22 btaz |, 
431 Q32 433 b?a31 bazz 433 


where b Æ 0. Show that det(A) = det(C). 
2.12 If A? = A, find all possible values of det(A). 


2.13 Let A be an nxn skew-symmetric matrix. Show that det( AT) = (—1)” det(A). 


2.14 Let 
0 2 0 0 0 8 
A=]|-1 0 0], B= ]0 -2 0 
0 0 2 2 0 0 


Find det ((2A)~!B) and det ((B~!A7)?). 
2.15 Find all values of k so that each of the following matrices is invertible. 


k -k 3 k k 0 
(a) A=] 0 k+1 1 |. (b) B=| k 4 k 
k -8 k-1 O k k 
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2.16 Let 
1 -3 4 
A=] -2 1 3 
7 6 —-l 


(a) Evaluate the determinant of A by cofactor expansion along the first row. 

(b) Evaluate the determinant of A by cofactor expansion along the second column. 
(c) Find adj(A). 

(d) Find A`! by using Theorem 2.10. 

(e) Find det ((3A)~' + adj(2A)). 


2.17 Let A € R4*4. The elements in the first row of A are 1,2, —3, 4. The cofactors 
of the elements of the third row of A are given by 6,2,9,5. Find the value of z. 


2.18 Suppose that A, B € R3*? such that adj(A)BA = 10BA — I3. If 


then find B. 
2.19 Let A € R"*". Show that if A 4 0 and adj(A) = AT, then A is invertible. 
2.20 Let A,B eR”*”. 


(a) Show that if A is invertible, then adj(A) is invertible and 


fadj(A)}-! = A = adj( A7’). 


1 
det(A) 
(b) Show that 

det[adj(A)] = [det(A)]”~?. 
(c) If det(A) = 2 and det(B) = —3, find 


det[2 - adj( A) B71]. 
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Challenge exercises 


2.21 Using the fact that 2093, 6992, 3496, and 989 are divisible by 23, show that 
the determinant of the following matrix is also divisible by 23 without computing 
the determinant directly. 


OWN Wb 
ok OO 
D-O WO eo 
ODN W 


2.22 Let A € R?*3. Show that 
det(AI3 — A) = A? — \*tr(A) + Atr(adj(A)) — det(A). 


2.23 Show that 


a b c d 
=} = 
det neal (a? +b? +c? +d’). 
—c d a —b 


2.24 Show that 


1 1 1 
ay a2 an 
2 2 2 
ay a5 ... an 
det d r š z II (a; = ai), 
: : : 1<i<j<n 
n—2 n—2 n—2 
ay ag eee An 
n-1 n—l1 n—-1 
ay ag mt Oy 
where @1,4@2,...,@, are distinct scalars from each other. 


2.25 Given four matrices A € R”*”, B € R™**, © € R**”, and D € R***, define 


A B 
C D 


M= 


Show that 
(a) If B = 0 and C = 0, then det(M) = det(A) det(D). 


(b) If B = 0 or C = 0, then det(M) = det(A) det(D). 
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(c) If A is invertible, then det(M) = det(A) det(D — CA~'B). 


2.26 Let A, B € R”*”. Show that if A? = B? = I and det(A) + det(B) = 0, then 
A+ B is not invertible. 


2.27 Let A € R”*”, Show that if AAT = I and det(A) < 0, then det(A +I) = 0. 


2.28 Let A= [a,j] € R”*” be an upper triangular matrix. Show that the cofactor 
Ci; of aij is zero if i < j. 


Chapter 3 


Euclidean Vector Spaces 


“An interesting feature of these codes is that they make a very intensive use of 
subroutines; the addition of two vectors, multiplication of a vector by a scalar, inner 
products, etc, are all coded in this way.” 

— James Hardy Wilkinson 


“ ‘Obvious’ is the most dangerous word in mathematics.” 
— Eric Temple Bell 


In the mid-seventeenth century, people started to use pairs of numbers to denote 
points in a plane and triples of numbers to denote points in a 3-dimensional 
space. Later, mathematicians recognized that they can apply a similar idea to high- 
dimensional spaces. For instance, an n-tuple of numbers can be used to represent a 
point in an n-dimensional space. In this chapter, we begin with the definition of the 
n-vector space and follow by the definition of Euclidean n-space. We then introduce 
linear transformations from R” to R™ and study their properties. 


3.1 Euclidean n-Space 


In this section, we first introduce definitions of the n-vector space and Euclidean 
n-space. Then we study some geometric properties of Euclidean n-space. 


3.1.1 mn-vector space 


Let R” := {(a1,@2,...,@n) | a; E€ R}, where an ordered n-tuple (a1, a2,...,@n) is 
called a vector in R”. Two vectors u = (u1, U2,..., Un) and v = (v1, V2,..., Un) in 
R” are called equal if 


U1 = U1, U2 = V2, sey Un = Un. 


Definition Let u = (u1, u2,...,Un) and v = (v1, V2,..., Un) be two vectors in R”. 
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(i) The sum u + v is defined by 


u + v := (u1 + U1, U2 + U2,.--,Un + Vn). 


(ii) If k is a scalar, the scalar multiplication ku is defined by 


ku := (kuz, kua,..., kun). 


The set R” with the operations of addition and scalar multiplication is called the 
n-vector space. 


The most important arithmetic properties of vector operations in R” are listed in 
the following theorem. The proof of the theorem is trivial and is left as an exercise. 


Theorem 3.1 Letu, v, and w be vectors in R”. Then 
(a) u+v=v+u. 
(b) u+ (v+ w) =(u+v)+w. 


(c) u+0=0+u = u, where 0 = (0,0,...,0). 


(da) u+ (—u) =0, ie, u—-u=0. 
(e) k(lu) = (klju. 
(£) k(u + v) = ku + kv. 
(g) (k+ Du = ku + lu. 
(h) lu=u. 
Here k and l are scalars in R. 


3.1.2 Euclidean n-space 


To develop geometrical notions of distance, norm, and angle in R”, we begin with 
the following definition. 


Definition Let u = (u1, U2,..., Un) and v = (v1, V2,...,Un) be any vectors in R”. 
Then the Euclidean inner product u - v is defined by 


n 
U: V i= UV + UQVg + -e + UnUn =>) uy. (3.1) 
i=l 


The vector space R” with the Euclidean inner product is called Euclidean n-space. 
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Some arithmetic properties of the Euclidean inner product are listed in the 
following theorem. 


Theorem 3.2 Letu, v, and w be vectors in R” and k be any scalar. Then 
(a) u:-v=v-u. 
(b) (ut+v)-w=u-w+v-w. 
(c) (ku) -v = k(u-v). 
(d) v-v >0. Further, v -v = 0 if and only if v = 0. 


Proof The proofs of (a) and (c) are trivial and we therefore only prove (b) and (d). 
Let 


u = (u1, U2,..., Un), V = (V1, U2,..., Un), W = (wi, W2,..., Wn). 


For (b), it follows directly from the definition of the Euclidean inner product that 


(u +v): w = (ur +vi)w + (u2 + v2)w2 +--+ + (Un + Un) Wn 


= (uw + ugwe +--+ + UnWn) + (viw + vow +--+ Unwn) 


=u:-w+v-w. 


For (d), we have 


Furthermore, 


n 
Xo = 04 vi = 0, l<icn. 
i=l 


Thus, v-v = 0 if and only if v = 0. 


3.1.3 Norm, distance, angle, and orthogonality 


Definition The Euclidean norm (or Euclidean length) of a vector u = (u, 
U2,-+-,Un) in R” is defined by 


ul] := (u-u)? = yututa. 


The distance of u = (u1, U2,..., Un) and v = (v1, V2,..., Un) ts defined by 


d(u,v) = |lu -= vl| = Vlr — v1)? + (uz — v2) + F (un — on)”. 
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The next theorem provides one of the most important inequalities in matrix 
theory: the Cauchy-Schwarz inequality. 


Theorem 3.3 (Cauchy-Schwarz Inequality in R”) Let u and v be vectors in R”. 
Then 
lu- v| < [full - Iiv. (3.2) 


Proof If u = 0 or v = 0, then the theorem is obviously true. Now assume u # 0 
and v £ 0. Construct a new vector 


r=u-+tv, tER. 


We have 


0O<r-r=(u+tv)-(ut+tv)=u-u+2u-vt+v-vt?. 
Considering the discriminant A of the quadratic function of t, we have 
A = (2u-v)? —4(u-u)(v-v) <0, 


which implies 
(u-v)? < (u-u)(v-v). 


Thus, 


lu: v| < llull- Iiv]. 


Remark In R?, we know that two nonzero vectors u and v form an angle 0, where 
0 <0< xr. Then we have by the cosine formula, 


og P+ vÊ- lv-u uv 


2|lul] - [ivl -alvi 


It follows from the Cauchy-Schwarz inequality (3.2) that the cosine of an angle 0 
between two nonzero vectors u and v in R” can also be defined by 


cos 6 := ev (3.3) 


The next two theorems are concerned with the basic properties of norm and 
distance in Euclidean n-space. 


Theorem 3.4 Letu and v be vectors in R” and k be any scalar. Then 
(a) lull > 0. 
(b) |jul| = 0 if and only if u = 0. 


(c) ||Aul] = [A] - ful]. 
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(da) |u + vl] < |lul] + ||vl|. (Triangle inequality) 


Proof From Theorems 3.2 (c) and (d), one can show that (a), (b), and (c) are true. 
Here we only prove (d). Based on (3.2), we have 


ju + vil? = (u+v)-(u+v) =u-u+2u-v+v-v<u-u+2Qlu-vjt+v-v 
2 
< llall? + 2ilulllivil + Ivl? = (lll + Ivl) 


Thus, (d) holds. 


Theorem 3.5 Letu, v, and w be vectors in R” and k be any scalar. Then 


(a) d(u,v) > 0 
(b) d(u,v) =0 if and only ifu = v. 

(c) d(u,v) = d(v,u) 

(d) d(u,v) < d(u,w) +d(w,v). (Triangle inequality) 


The proof of Theorem 3.5 is left as an exercise. 
We now introduce the concept of orthogonality of vectors. 
Definition Two vectors u and v in R” are called orthogonal if u-v = 0. 


Remark Actually, two nonzero vectors u and v are orthogonal if and only if the 
angle 0 between u and v defined by (3.3) is 7/2 . 


Theorem 3.6 (Pythagorean Theorem in R”) Letu and v be orthogonal vectors 
in R” with the Euclidean inner product. Then 


lu + vl? = lal? + Ivl. 


Proof Since u- v = 0, we have 


lu+ vl? = (u +v); (+v) = Jul? + 2u- v + [jv]? = al? + liv. 


3.1.4 Some remarks 


(1) A vector u = (ui, U2,...,Un) € R” can be written in row matrix notation or 
column matrix notation if no confusion arises: 


u = |ui, U2,..., Un] or u= 
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Notice that there is no essential difference between an n-vector anda l xn 
matrix or an n x 1 matrix from an algebraic viewpoint. Thus, in the following, 
we will use the notations above to denote any vector in R” freely. 


(2) The Euclidean inner product (3.1) of vectors 
u = (Uj, U2,-.--, Un) and v = (U1, V2,..., Un) 


can also be written in a form of the matrix product: 


U1 
U2 
U: V = UV] + UW + +++ + UnUn = (U1, U2,---, Un] 
Un 
(3) If the row matrices of an m x r matrix A are r1,r2,...,r'm and the column 
matrices of an r x n matrix B are c,C€2,...,Cn, then by using the Euclidean 


inner product, the matrix product AB can be expressed as 


rı: Cı rı -C2 eee ri Cn 
To- Ci r2 * C2 eae ro °Cy 
AB= 
Tm*C1 VPm-+Cq ++: Pm*Cn 


3.2 Linear Transformations from R” to R™ 


In this section, we study linear transformations from R” to R”. 


3.2.1 Linear transformations from R” to R™ 


A transformation T from R” to R™” is a map which maps each point 
X = (£1, £2,..., £n) in R” to a unique point T(x) = w = (w1, W2,..., Wm) in 
R”. A linear transformation T: R” — R” is a map which is defined by linear 
equations of the form 


Wy = 441% T A12%Q T ``’ T Aintn 


W2 = 421%, T G22%2 T's T A2nIn 


Wm = Am1L1 + Am2%2 + +++ + Amnn 
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or in matrix notation 


Wi Q11 G12 t Gin Tı 
w2 Q21 Q22 *** Gan T2 
Wm Ami Am2 `t Amn Tn 
or more briefly by 
w = T(x) = Ax. 
The m x n matrix A = [a;;] is called the standard matrix for the linear 


transformation T and T is called multiplication by A. We sometimes denote this T 
by Ta, i.e., 
Ta (x) = Ax. 


Hence T4 is also called a matriz transformation from R” to R”. 


Remark If 0 is the m x n zero matrix, then for every vector x in R”, we have 
To(x) = 0x=0 €R”. 


We call To the zero transformation from R” to R™. Moreover, for any m x n 
matrix A and the zero vector 0 € R”, we have 


Ta (0) = A0 =0ER”. 
If I is the n x n identity matrix, then for every vector x € R”, 
T(x) = Ix =x, 


so multiplication by J maps every vector in R” into itself. We call Tr the identity 
transformation on R”. 


3.2.2 Some important linear transformations 


Among the most important linear transformations on R? and R? are reflections, 
projections, and rotations. We now discuss such linear transformations T one by 
one [1]. In the following, let u = (x,y) € R? or u = (x,y,z) € R? and we denote 
w = T(u) by w = (wi, we) or w = (w1, We, w3). 


(1) Reflection transformations. 
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Transformation Illustration Equations Standard Matrix 


Reflection about 


the y-axis wi =E -1 0 
w2 = y 0 1 
Reflection about wi =y 0 
the line y = x w= 0 
Reflection about is 1 0 0 
the xy-plane bee = 0 0 
w3=-z 0 0 -1 
(2) Projection transformations. 
Transformation Illustration Equations Standard Matrix 
Orthogonal projection 
on the z-axis WE TR ees 
w2 = 0 0 0 
wi =£ 1 0 0 
Orthogonal projection 
w2 = y 0 1 0 
on the xy-plane 
w3 = 0 0 0 0 


(3) Rotation transformations. 
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Transformation Illustration Equations Standard Matrix 
y (wi, w) 
Rotation ~ 
through _ wi = xcos@— ysin 0 cos@ —sin@ 
an angle 0 w \ (2, y) w2 = xsin + y cos sin 0 cos 0 
D : 
> 


Counterclockwise 

rotation about w =T 1 0 0 
the x-axis with w2 = ycos@ — zsin 0 0 cos —sin0 
an angle 0 w3 = ysin 0 + z cos 0 0 sin cosé 


Remark In cy-plane, by using the polar coordinates, we have x = rcosa and 
y=rsina. Thus, 


w = rcos(a+ 0) = rcosacosé — r sin a sin 0 = x cos 0 — ysin 9, 


w2 = r sin(a + 0) = r cos asin 0 + r sin a cos 0 = xsin 0 + y cos 0. 


(4) Contraction and dilation transformations. 


Transformation Illustration Equations Standard Matrix 


Contraction with w = ka k 0 0 
factor k on R3 w = ky k 0 
(O<k<1) w3 = kz 0 0 k 
wi = kx 0 0 
Dilation with factor k 0 6 
wə = 
k on RË (k > 1) i J 
w3 = kz 0 0 k 


3.2.3 Compositions of linear transformations 


Let Ta: R” —> RË and Tg: R¥ > R” be linear transformations. Then for each 
x € R”, one can first compute T4(x) € R*, and then compute Tg (T4(x)) € R”. 
Hence the application of T4 followed by Tg produces a transformation from R” to 
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R”. This transformation, denoted by Tg o T4, is called the composition of Tz 
with T4. Actually, we have 


(Tp o Ta)(x) = Tg (Ta(x)) = B(Ax) = (BA)x. (3.4) 

Thus, TgoT4 is multiplication by BA, which is also a linear transformation. Formula 
(3.4) tells us that the standard matrix for Tg o T4 is BA, i.e., 

Tg O Ta = TRA. (3.5) 


Remark Formula (3.5) reveals an important idea that actually multiplying 
matrices is equivalent to composing the corresponding linear transformations in the 
right-to-left order of the factors. 


Compositions can be defined for three or more linear transformations. For 
instance, we consider the linear transformations 
Ta: R” >R", Ts: R'>R', To: R'>R”™. 
The composition Te o Tg o Ta: R” > R” is given by 
(Tc 0 Tg o Ta)(x) = To (Tp(Ta(x))) = C(B(Ax)) = (CBA)x. 
Thus, the standard matrix for Tg o Tg o T4 is CBA, which is a generalization of 


(3.5). This property can be extended to a finite number of linear transformations 
without any difficulty. 


3.3 Properties of Transformations 


In this section, we study the linearity conditions and investigate the relationship 
between the invertibility of a matrix and properties of the corresponding matrix 


transformation. 


3.3.1 Linearity conditions 


Theorem 3.7 A transformation T: R” >R” is linear if and only if the following 
linearity conditions hold for all vectors u and v in R” and every scalar c. 


(a) T(u + v) =T(u) + T(v). 
(b) T(cu) = cT (u). 


Proof If T is a linear transformation, then it is easy to see that the linearity 
conditions hold. Conversely, if the linearity conditions hold, then for any vector 
x = [£1, £2,- .-, £n]? € R”, we can express x by the following linear combination: 


n 
x = y TZiC€i, 
i=1 
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where e; is the ith column matrix of the n x n identity matrix for 1 < i < n. By 
using the linearity conditions, we obtain 


T(x) = T( S zies) = > ziT(e;) = Ax, 


where the successive column matrices of A are T(e1),T(e2),...,T (en), i.e., 


A= | Tex) | Tle) | = | Teen) |. (3.6) 


Thus, T is a linear transformation and A is the standard matrix for T. 


3.3.2 Example 


Let l be the line in the xy-plane that passes through the origin and makes an angle 
0 with the positive z-axis, where 0 < 8 < 7/2. As illustrated in Figure 3.1 (a), let 
T: R? — R? be the linear transformation that maps each vector into its orthogonal 
projection on L. 


(a) Find the standard matrix A for T. 


(b) Find the orthogonal projection of the vector x = [2,3]T onto the line through 
the origin that makes an angle of 6 = 7/6 with the positive x-axis. 


Solution For (a), it follows from (3.6) that 
A=] T(e1) | Tle2) |, 


where 


Figure 3.1 
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Referring to Figure 3.1 (b), the length of the vector T(e1) is given by ||T'(e1)|| = cos 6, 
so 


T(e1)|| cos 0 cos? 0 

T(e1) = = 

||T(e1)|| sin 0 sin 6 cos 0 

Referring to Figure 3.1 (c), the length of the vector T (e2) is given by ||T(e2)|| = sin 8, 
so 


T(e2)|| cos 0 cos 0 sin 0 
T(e2) = = 
||T(e2)|| sin 0 sin? 0 


Thus, the standard matrix A for T is 
cos20 sin cos 
A — 
sin cos 0 sin? 0 


For (b), since sin(7/6) = 1/2 and cos(7/6) = V3/2, it follows from part (a) that the 
standard matrix A for this projection transformation is 


3/4 v3/4 
v3/4 1/4 


Thus, the orthogonal projection of the vector x is 


6 + 3v3 
2 3/4 v3/4 2 4 
T(x)=T = = 
3 v3/4 1/4 3 2/3 +3 
4 


3.3.3 One-to-one transformations 


Definition A linear transformation T: R” > R” is said to be one-to-one if T 
maps distinct vectors in R” into distinct vectors in R™. 


For the relationship between the invertibility of a square matrix and properties 
of corresponding linear transformation, we have the following theorem. 


Theorem 3.8 Let A be annxn matriz and T4: R” > R” be multiplication by A. 
Then the following statements are equivalent. 


(a) A is invertible. 
(b) The range of T4 is R”, where the range of T4 is given by {T4(x) | x€ R”}. 


(c) Ta is one-to-one. 
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Proof We first give three more equivalent statements (a’), (b’), and (c’). 


(a) A is invertible. (a’) Ax = 0 = x=0. 
(b) The range of T4 is R”. (b’) Ax = w is consistent for all w € R”. 
(c) Ta is one-to-one. (c) Ta(x) = 0 = x = 0. 


Then by using Theorem 2.8 and the definition of linear transformations, one can 
easily see that 
(a) e (a) & (b) & (b), (a) © (c). 


Therefore, in order to complete the proof of the theorem, we only need to prove (c) 
= (c’). 


(c) = (c’): If T4 is one-to-one, then for any nonzero x € R”, 
Ta(x) # T4(0) = 0. 
Thus, (c’) holds. 


(c) = (c): Let x1,x2 E€ R” and x; # x2. We want to show that T4(x1) 4 Ta (x2). 
By contradiction, we assume that T4(x1) = T4(x2). Then 


Ta (xı m X2) = Ta (xı) = Ta (x2) = 0. 


By the given condition, we have x; — x2 = 0 and then xı = x2, which contradicts 
the fact that xı # x2. Thus, (c) holds. 


Remark Let T4: R” — R” bea one-to-one linear transformation. Then it follows 
from Theorem 3.8 that the matrix A is invertible. Thus, T4-ı: R” — R” is 
itself a linear transformation and it is called the inverse of T4. In fact, the linear 
transformations T4 and T4-ı cancel the effect of one another. More precisely, for 
all x in R”, 


Ta(Ta-1(x)) SAA eS Ix =x, T4-1(Ta(x)) = A™'Ax = Ix =x, 
or equivalently, 
Ta oTa- = Taa- = Ty, T4-10T, =Ty-14 = Ty. 
3.3.4 Summary 


Theorem 3.9 Let A be ann x n matrix and T4 : R” > R” be multiplication by 
A. Then the following are equivalent. 


(a) A is invertible. 


(b) Ax =0 has only the trivial solution. 
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(c) The reduced row-echelon form of A is In. 

(d) A is expressible as a product of elementary matrices. 

(e) Ax = b is consistent for every n x 1 matriz b. 

(£) Ax =b has exactly one solution for every n x 1 matriz b. 
(g) det(A) 4 0. 

(h) The range of T4 is R”. 


(i) Ta is one-to-one. 


Exercises 


Elementary exercises 
3.1 Prove Theorem 3.1. 


3.2 Let vı = [1,0,1,0], və = [0,1,0,2], v3 = [2,0,0,1], and v4 = [0,—2,—3, 0]. 
Find ky, ka, k3, and k4 such that 


kivi + kove + kgv3 + kava = |—4, 1, —5, 5]. 
3.3 Find the inner product u- v. 
(a) u = [4,2,—7], v= [|-1,2,5]. (b) u = [-2,8,4,—7], v= [5,—1, —3, 2]. 
3.4 Let u = [4,1,2,3], v = [0,3,8,—2], and w = [3,1,2,2]. Evaluate each 


expression. 
1 


(a) ||3u — 5v + w|]. (b) —w. 
[w] 
3.5 Find u-v if |u + v|| = 1 and ||u — v|| = 5. 

3.6 Find ||u + v|| if |lul| = [|v] = |u — v|| = 2v2. 

3.7 Let u,v € R”. Show that 


(a) lu + vl? + llu- vll? = 2 ([lull? + Ivl?) 


E E = 4u-v. 
3.8 Ifu,v € R”*! and A € R”*”, show that 


(uT AT Av)? < (uT AT Au) (vT AT Av). 
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3.9 Prove Theorem 3.5. 
3.10 For which value of k, are u = [2,1,3] and v = [1,7, k] orthogonal? 
3.11 Let u,v € R”. Show that u and v are orthogonal if and only if 
Ju + v|| = [u — vll. 
3.12 Let u,v,w € R”. 
(a) If u is orthogonal to v and w, is u orthogonal to v + w? 
(b) If u is orthogonal to v + w, is u orthogonal to v and w? 


3.13 Let uj, u2,...,u, € R”. Show that if u1, u2,..., Un are pairwise orthogonal, 
i.e., u;- u; = 0 for any i ¥ j, then 


lui + ug +--+ + uall? = [lu ? + laal? +--+ lan’. 


3.14 Show that 


z | = | ; | defines a linear transformation from RÊ to R?. 


7 | = | | does not define a linear transformation from R? to R?. 


3.15 Find the standard matrix for each of the following linear transformations. 


0 
Ass +224 + 
(ED |\| gape eres yr T | a) ets 
m 2X2 E 323 T3 — T3 
: T4 T2 


3.16 For each part, find the standard matrices for T} and Tọ, then determine 
whether Ti O To = To O Ti. 


(a) Tı: R? — R? is the reflection about the z-axis, and T2: R? —> R? is the 
reflection about the y-axis. 


(b) Tı: R? — R? is the reflection about the x-axis, and T3: R? > R? is the 
orthogonal projection on the y-axis. 


(c) Tı: R? > R? is the rotation through an angle 0, and T2: R? > R? is the 
reflection about the y-axis. 
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(d) Tı: R? > R? is the orthogonal projection on the xy-plane, and To: R? > R 
is the orthogonal projection on the yz-plane. 


(e) Tı: R? — R? is the counterclockwise rotation about the positive x-axis 
through an angle 61, and Ty: R? — R? is the counterclockwise rotation about 
the positive y-axis through an angle 63. 


3.17 Let T: R? > R? be the linear transformation that maps each vector into its 
reflection about the line J. 


(a) Find the standard matrix for T if l is the line in the xy-plane that passes 
through the origin and makes an angle 0 with the positive x-axis, where 0 < 
0 S 7/2. 


(b) Find the reflection of the vector x = [1,5]? about the line / through the origin 
that makes an angle of 0 = 7/6 with the positive x-axis. 


3.18 Let T: R? > R? be the linear transformation that counterclockwise rotates 
each vector about the positive y-axis through an angle 0, where 0 < 6 < 7/2. 


(a) Find the standard matrix for T. 

(b) Find the rotation of the vector x = [—5,1,2]” through an angle of 0 = 7/3. 
3.19 Let Ti: R” > R” and To: R™ — R5 be linear transformations. 

(a) If T, and T, are one-to-one, is Ty o T, one-to-one? 

(b) If either Tı or Ty is one-to-one, is Tz o T} one-to-one? 


3.20 Determine if each linear transformation T: R” > R” (n = 2,3) defined by 
the given equations is one-to-one; if so, find the standard matrix for the inverse 
transformation, and find T~?. 


Wy = Myr 222 T & 
WS WMT 2x2 

(a) (b) W2 = —%1ı t Lg—- T3 
W2 = 21T 2. 

wz = Wr T2 T 323 


Challenge exercises 


3.21 Let u,v € R”. Show that if u - w = v- w holds for all w € R”, then u = v. 


3.22 Let A € R™*”" and B € R”*™. Show that if (Ax)-y = x- (By) for all 
x € R” and y € R”, then B = A’. 
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3.23 Let x,y,z € R”. Show that 


lx — yl? = 2llz — x||? + 2llz — yl? — 4llz — (x +y)/2I/°. 


3.24 Using the Cauchy-Schwarz inequality, show that 


(a) If a1,a2,...,@, > 0, then 
1 1 


1 
(a1 tag +++ + an)(— + = +--+ —) >n. 
ay a2 an 


(b) If a,b,c > 0, then 


(5 b+ J gli yalp 
t Cc = al + t C 
3° 2 6 
(c) If a1,a2,..., an, W1, W2;,..., Wn >Oand >> wp = 1, then 
k=1 
n 2 n 
(X aws) < X akur. 
k=1 k=1 


3.25 Let x,y € R”. Show that the following are equivalent. 


/N 


(a) x-y <0. 


(b) ||x|| < |x — ay|| for all a > 0. 
(c) Ix] 


3.26 Let x,y € R”. Show that the following are equivalent. 


< ||x — ay|| for all a € [0,1]. 


(a) x-y =0. 
(b) |[xl| < x — oy|| for all ae R. 
(c) |[x|| < |x — ayl] for all « € [-1, 1. 
3.27 Let T4: R? > R? be the matrix transformation such that for all x € R3, 
Ta(x):x = Ax: x=0. 
(a) Show that A is not invertible. 
(b) Is a similar assertion true for a matrix transformation T4: R? + R?? 


3.28 Let T4 be the matrix transformation from R™ to R”, where m < n. Show 
that the following are equivalent. 
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(a) T4 is one-to-one. 


(b) There exists a matrix transformation Tg from R” to R™ such that Tg o T4 = 
Tr, where T; is the identity transformation on R”. 


(c) For any matrix transformations Tc and Tp from R” to R™ satisfying T4oTo = 
Ta (0) Tp, we have To = Tp. 


Chapter 4 


General Vector Spaces 


“ Mathematics is the art of giving the same name to different things.” 


— Henri Poincaré 


“ Mathematics is the tool specially suited for dealing with abstract concepts of any kind and 
there is no limit to its power in this field.” 
— Paul Dirac 


In this chapter, we generalize the concept of vectors in R” further. Ifa class of objects 
with two operations satisfies a set of axioms, then we entitle those objects to be called 
“vectors”. Moreover, since the axioms of generalized vectors are based on properties 
of vectors in R”, the generalized vectors have many similar properties. Thus, this 
generalization provides a powerful tool to extend geometric properties of vectors in 
R” to many important mathematical problems where geometric intuition may not 
be available. Consequently, if we have a problem involving our generalized vectors, 
say matrices or functions, we may study the problem based on the corresponding 
one in R”. 


4.1 Real Vector Spaces 


In this section, we extend the concept of vectors in IR” by extracting the most 
fundamental properties from them and turning those properties into axioms for our 
generalized vectors. 


4.1.1 Vector space axioms 


The following definition is extremely useful for many purposes. It consists of two 
operations and eight axioms. 


Definition Let V be a nonempty set of objects on which two operations are defined, 
addition and scalar multiplication. It requires that V is closed under the addition 


80 Chapter 4 General Vector Spaces 


and scalar multiplication, i.e., for each pair of objects u and v in V, u +v is in V; 
for each scalar k and each object u in V, ku is in V. Then V is called a vector 
space and the objects in V are said to be vectors if the following eight axioms are 
satisfied for all u, v, and w in V. 


(Gj) u+v=v+u. 
(ii) u + (v +w) =(u+v)4+w. 


(ii) There is an object O in V, called a zero vector for V, such that for allu in V, 
u+0=u. 


(iv) For each u in V, there is an object —u in V, called a negative of u, such that 
u+(—u) =0. 


(v) k(u + v) = ku + kv. 
(vi) (k+ l)u = ku + lu. 
(vii) k(lu) = (kl)u. 

(viii) 1u = u. 


Here k and l are scalars. If the scalars are in R, then V is called a real vector 
space. 


Remark In fact, Axiom (i) is not necessary because it can be deduced by the other 
axioms. Hence there is no need to list it explicitly. See Appendix A for details. 


In the following, all scalars will be real numbers until Section 8.3. 
Examples 


(a) The set R” with the operations of addition and scalar multiplication defined 
in Subsection 3.1.1 is a typical example of a vector space. 


(b) The set R™*” with the operations of matrix addition and scalar multiplication 
is a vector space. 


(c) For all functions f = f(x) and g = g(x) defined on (—oo, 00), we define the 
following operations of function addition and scalar multiplication 


(€ +g)(x) := f(x) +g9(@),  (k£)(z) := kf (2), 


where k is a scalar. Then the set of functions with these two operations is a 
vector space, denoted by F(—o0,00). Note that the zero vector 0 € F(—oo0, co) 
is the zero function defined by 0 := 0(x) = 0 for all x € (—o0, 00). 
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4.1.2 Some properties 


Theorem 4.1 Let V be a vector space, u be a vector in V, and k be any scalar. 
Then 


(d) If ku = 0, thenk =0 oru=0O. 


Proof We only prove (a) and leave the proofs of the remaining parts as an exercise. 


For (a), we can write by Axiom (vi), 
0u + Ou = (0 + 0)u = 0u. 


By Axiom (iv), we know that the vector Ou has a negative —Ou. Adding this negative 
to both sides above obtains 


[0u + Ou] + (—Ou) = Ou + (—Ou). 
Then we have by Axiom (ii), 
Ou + [0u + (—0u)] = Ou + (—Ou). 
It follows from Axiom (iv) that 
Ou+0=O0. 


Thus, by Axiom (iii), it holds 


Ou = 0. 


4.2 Subspaces 


We consider a special kind of subset of a vector space V that is itself a vector space 
under the operations of addition and scalar multiplication defined on V. 
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4.2.1 Definition of subspace 


Definition A subset W of a vector space V is called a subspace of V if W is itself 
a vector space with respect to the addition and scalar multiplication defined on V. 


The following theorem states that W C V is a subspace if and only if W is closed 
under the operations of addition and scalar multiplication. 


Theorem 4.2 Let W be a nonempty set of vectors in a vector space V. Then W 
is a subspace of V if and only if the following conditions hold. 


(a) If u and v are in W, then u + v is in W. 


(b) If k is any scalar and u is in W, then ku is in W. 


Proof Let W be a subspace of V. It follows from the definition of vector space that 
conditions (a) and (b) hold. 


Conversely, assume conditions (a) and (b) hold. Axioms (i), (ii), (v), (vi), (vii), 
(viii) are automatically satisfied by the vectors in W since they are satisfied by all 
vectors in V. Therefore, to complete the proof, we need only verify that Axioms (iii) 
and (iv) are satisfied by vectors in W. Let u € W. By condition (b), we know that 
ku is in W for every scalar k. Setting k = 0, it follows from Theorem 4.1 (a) that 
Ou = 0 is in W. Setting k = —1, it follows from Theorem 4.1 (c) that (—1)u = —u 
is in W. Thus, Axioms (iii) and (iv) hold. 


Examples 


(a) The following subsets are subspaces of R? or R3. 
In R?: {0}; lines through the origin; R?. 
In R: {0}; lines through the origin; planes through the origin; R3. 


For instance, one can easily check that the sum of two vectors on a line 1 
through the origin of R? or R? also lies on J, and a scalar multiple of a vector 
on this line l is on J as well. Thus, by Theorem 4.2, the line l through the 
origin is a subspace. 


(b) Let W be the set of all n x n symmetric matrices. For any two matrices 
A,B € W and any scalar k, it follows from Theorem 1.21 (b) and (c) that 
A+ B and kA are both symmetric matrices. Thus, by Theorem 4.2, W is a 
subspace of R”*”. 


4.2 
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Let F(—oo, 00) be the vector space of functions discussed in Subsection 4.1.1, 
n be a positive integer, and P, be the subset of F'(—oo, co) consisting of all 
polynomials in the following form 


p := p(£) = ao +012 +-+- + ant”, 


where ao, a1, ...,a@n are real numbers. Then P, consists of all real polynomials 
of degree n or less. We now show that P, is a subspace of F(—o0, 00). For any 
two polynomials p,q € P, and any scalar k, one can directly check that p +q 
and kp are both in Pa. Thus, by Theorem 4.2, P,, is a subspace of F(—oo, 00). 
Moreover, we have the following chain of subspaces: 


F(—00,00) D- D Pa D Pa-1 D- D PDP. 


We consider the solution set of a homogeneous linear system Ax = 0, where A 
is an m x n matrix. Let u, v be any two solutions and k be any scalar. Then, 
Au = 0 and Av = 0. We therefore have 


A(u +v) = Au + Av =0+4+0=0, A(ku) = kAu = k0 = 0. 


So u+ v and ku lie in the solution set. It follows from Theorem 4.2 that 
the solution set is a subspace of R”. Thus, the solution set will be called the 
solution space of Ax = 0. 


Let W and U be two subspaces of a vector space V. Then W AU is a subspace 
of V and W + U is also a subspace of V, where 


WAU :={v|ve W andve U}, W+U:={wt+u|weW, uc JU}. 
(4.1) 


See Exercise 4.2. 


4.2.2 Linear combinations 


Addition and scalar multiplication are frequently used in combination to construct 


new vectors. 


Definition Let vı, V2,..., Vr be vectors in a vector space V. A vector w in V is 


called a linear combination of v1,V2,...,vr if it can be written in the form 


w = kivi + kov +++ + krVr, 


where kı, k2,..., ky are scalars. 


Theorem 4.3 Let vı, V2,...,Vr be vectors in a vector space V. Then 
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(a) The set W of all linear combinations of v1,V2,...,Vr is a subspace of V. 
(b) W is the smallest subspace of V that contains v1, v2,...,V, in the sense that 
every other subspace of V that contains v1,V2,...,Vr must contain W. 


Proof For (a), let u and w be vectors in W. Since W is the set of all linear 
combinations of vectors v1, V2,...,V,, the vectors u and w can be written as 


u = 1V1 + Cove ++- + CrVr, w = divi + doV2 +°- + drVr. 
For any scalar k, we have 
u + kw =c1V1 + cova +-+- + CrVr + k(divi + dove + -+ + drvr) 
= (cı + kdi)vı + (co + kd2)v2 +--+ + (cr + kdr)vr, 
i.e., u + kw is in W. It follows from Theorem 4.2 that W is a subspace of V. 


For (b), let U be another subspace of V that contains v1, V2,...,Vr. It follows from 
Theorem 4.2 again that U must contain all linear combinations of v1, V2,..., Vr. 
For any w € W, it can be expressed in the form 


w= kıvı + kove+ ee + krVr. 


Then w should be in U. Thus, W C U. 


Definition Let S = {v1, V2,..., Vr} be a set of vectors in a vector space V and W 
be the subspace of V consisting of all linear combinations of the vectors in S. Then 
W is called the subspace spanned by v1,V2,...,Vr and denoted by 


W := span{v1, V2;,...,Vr} or W =span(S). 
Example 1 The polynomials 1, g, £?,..., x” span the vector space P„ since each 
polynomial p € P, can be written as 
P = ao + a£ +-+ + anat”, 
which is a linear combination of 1, £, x?,..., x”. Thus, 
P, =span{1,2,2”,...,2"}. 


Example 2 Determine whether vı = [1,3,2], v2 = [1,0,2], and v3 = [2, 3, 4| span 
the vector space R3. 
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Solution We must determine whether an arbitrary vector b = [b1, bz, b3] € R3 can 
be written in the form of 
b = kivi + kove + k3v3. 
Expressing this equation in terms of components yields 
[b1, bo, b3] = ky [1,3, 2] + k2[1, 0,2] + k3[2,3, 4], 


i.e. 
[b1, b2, b3] = [ki + ko + 2k3, 3kı + 3ks, 2kı + 2k + 4ks] 


or 


kı + ko + 2k = bı 


3kı T 3k3 = bo 


2k, + 2ko + 4kg = bz. 


Thus, the problem reduces to determining whether the system is consistent for all 
values of bı, b2, and b3. By Theorem 3.9 (e) and (g), a system with a square 
coefficient matrix is consistent for every vector on the right-hand side if and only if 
the determinant of the coefficient matrix of the system is not equal to zero. However, 


1 
det | 3 = 0. 
2 


N O e 
rw N 


Thus, v1, V2, and v3 can not span R3. 


Theorem 4.4 Let S = {v1, V2,..., Vr} and S’ = {w1,W2,..., Wọ} be two sets of 
vectors in a vector space V. Then 


span{V1, V2, PN Vr} Eè span{W1, We, rae , Wk} 


if and only if each vector in S is a linear combination of those in S', and conversely 


each vector in S” is a linear combination of those in S. 


The proof of Theorem 4.4 is left as an exercise. 


4.3 Linear Independence 


We knew that a set of vectors S = {v1, V2,..., Vr} spans a given vector space V if 
every vector u € V can be written as a linear combination of the vectors in S, i.e., 


u = kivi + kovo +--+ + krVr, 
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where k; (1 < j < r) are scalars. In general, there may be many different ways to 
express a vector in V as a linear combination of the vectors in S. In this section, 
we study conditions of S under which each vector in V can be expressed as a linear 
combination of the vectors in S in a unique way. 


4.3.1 Linear independence and linear dependence 


Definition Let S = {v1, V2,..., Vr} be a given nonempty set of vectors. Then the 
vector equation 
kivi + kovo +- + kryr =0 


has at least one solution obviously 


ki = k2 =- = k, =0. 


If this is the only solution, then S is called a linearly independent set (or 


V1, V2,...,Vr are said to be linearly independent). If there exist nonzero solutions, 
then S is called a linearly dependent set (or v1, V2,...,Vr are said to be linearly 
dependent). 
Examples 


(a) Determine whether S = {v1, V2, v3} is linearly independent or not, where 
vı = [2,—2, 1], ve = [5,3, —2], v3 = [7,1,—1]. 


Solution Since 
vı + V2 — V3 = 0, 


S = {v1, V2, V3 } is linearly dependent. 


(b) Show that the polynomials 1, x, £?,..., x” are linearly independent in P}. 
Proof We consider the following equation 
Co + ee + Cox? +-+ + cna” =O(x), x E (—00, 00), (4.2) 


where 0(x) is the zero function. Recall from the Fundamental Theorem of Algebra 
[11] that any nonzero polynomial of degree n in one variable has at most n distinct 
complex roots. However, the polynomial co + cx + c2? +--+» + cng” in (4.2) has 
infinitely many roots. Therefore, its coefficients should be all zero, i.e., 


Co = C1 = C2 = ++- = Cn = Q. 


Thus, 1, x, x?,..., x” are linearly independent in P}. 
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4.3.2 Some theorems 


The following theorems are concerned with some basic properties of linear indepen- 
dence. 


Theorem 4.5 Let S be a set with two or more vectors. Then 


(a) S is linearly dependent if and only if at least one of the vectors in S is expressible 
as a linear combination of the other vectors in S. 


(b) S is linearly independent if and only if no vector in S is expressible as a linear 
combination of the other vectors in S. 


Proof Let S = {v1, V2,..., Vr} with r > 2. 


For (a), based on the definition of a linearly dependent set, S is linearly dependent 
if and only if 
kivi + kovo +--+ kryr =0 


has nontrivial solutions, i.e., there exists at least a nonzero ks for some t such that 


kı Aten keri g Daa 
kı t+1 ki Tr 


Thus, (a) holds. Part (b) follows immediately from (a). 


Theorem 4.6 A set of a finite number of vectors that contains the zero vector is 
linearly dependent. 


Proof Let S = {0,vi,v2,...,Vv,} and consider the following equation 
ko0 + kıvı + k2V2 tee krVr = 0. (4.3) 
Let ko = 2 and ky ko Bere kr 0. Then 


2.-0+0-vı +0: v2 + +0: v, =0, 


i.e., equation (4.3) has a nonzero solution. Thus, S is linearly dependent. 


Theorem 4.7 Let S = {vi,vo,...,Vv,} be a set of vectors in R”. Ifr >n, then S 
is linearly dependent. 


Proof First, we assume that the vectors in S = {v1,V2,...,v,} are column vectors. 
Consider the following equation 


kivi + kova +--+ + krVvr =0. 


88 Chapter 4 General Vector Spaces 


We can rewrite it in the following matrix form 


kı 0 
ko 0 

| Met asl ome |) ols 
kr 0 


which is a homogeneous system of n linear equations in r unknowns. Since r > n, 
it follows from Theorem 1.1 that the system above has infinitely many solutions 


(nonzero solutions). Thus, S is linearly dependent. 


4.4 Basis and Dimension 


How can the vectors in a vector space be generated? There exist some linearly 
independent subsets which can span the entire vector space. For instance, R? = 
span{[1,0], [0,1]} and P2 = span{1,2,x27}. Concepts of basis and dimension are 
proposed from such kinds of subsets. 


4.4.1 Basis for vector space 


Definition Let V be any vector space and S = {v1,V2,..-,Vn} be a set of vectors 
in V. Then S is called a basis for V if the following two conditions hold. 


(i) S is linearly independent. 
(ii) V = span(S). 
Theorem 4.8 Let S = {v1,v2,...,Vn} be a basis for a vector space V. Then every 
vector u in V can be expressed in the form 
U = CLV, + CoV2 + e + CnVn 
in exactly one way. 


Proof We only need to show that there is only one way to express a vector u € V 
as a linear combination of the vectors in S. Suppose that u can be written as 


U = CLV, + CoV2 + e F CnVn 


and also as 
u = dıvı + d2V2 +++ AnVn- 
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By subtracting the second equation from the first one, we obtain 


0 =u—u= (cv +C2V2 +++ + CnVn) — (divi + dove +--+ + dnVn) 


= (cı — dy)vy + (c2 — d2)V2 + +++ + (Cn — dn)Vn- (4.4) 
Since S is linearly independent, (4.4) implies 


Ci — di = 0, i.e., Ci = di, l<icn. 


Thus, the two expressions for u are the same. 


4.4.2 Coordinates 
Definition Let S = {v1, V2,..., Vn} be a basis for a vector space V and 
V = C1 V1 + CoV2 +°+++CnVn 


be the expression for a vector v in terms of the basis S. Then the scalars c,,¢2,..- Cn 
are called the coordinates of v relative to the basis S. The vector |c1, c2, ... , Cn] in 
R” is called the coordinate vector of v relative to S and is denoted by 


[v]s = [C1,C2,-+-,Cn]- 
Remark Let S = {e1,e2,...,e,} be a set of vectors in R”, where 
e1 = [1,0,0,...,0], Se S101 0h on Oly ..., €n = [0,0,0,...,1]. 


One can show that S is a basis which is called the standard basis for R”. For every 
vector x = [#1,%2,...,%n] E R”, it can be expressible as 


X = L1€1 + T2€2 +++: + Lnen. 
Then the coordinate vector of x relative to the standard basis S' is 
[x]s = [£1, £2,..., £n]. 


Thus, x = [x]s, i.e., a vector x and its coordinate vector relative to the standard 
basis are the same. 


Example Let S = {v1, V2, V3}, where vı = [1,1,2], v2 = [0,2,1], and v3 = 
[2,1,3]. 


(a) Show that S$ is a basis for R3. 
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(b) Find the coordinate vector of v = [—3,5,—1] relative to S. 


(c) Find the vector v in R whose coordinate vector relative to S is [v]g = 
[-1, 2, 1] ; 


Solution For (a), to show that S spans R°, we must show that any vector b = 
[b1, b2, b3] € R? can be expressed as a linear combination of the vectors in S, i.e., 


b = C1V1 + C2V2 + C3V3. 
Expressing this equation in terms of components gives 
[b1, b2, ba] = afl, 1, 2] a €2[0, 2, 1] T c3[2, 1, 3], 


or in matrix form 


1 0 2 Cy by 
1 2 1 C2 = bə . (4.5) 
21 3 C3 b3 
Since 
1 0 2 
det} 1 2 1|=-140, 
2 1 3 


it follows from Theorem 3.9 that (4.5) has a unique solution for every b. Thus, S$ 
spans R3. 


To prove that S is linearly independent, we must show that the following equation 
C1V1 + C2V2 + C3 V3 = (0) (4.6) 


has only the zero solution cy = cp = cz = 0. In matrix form, (4.6) can be written as 


a homogeneous system 


1 0 2 CI 0 
1 2 1 co l|=]0], (4.7) 
2 1 3 C3 0 


which is a special case of (4.5) when b = 0. Hence (4.7) has only the trivial (zero) 
solution by Theorem 3.9 again. Thus, S$’ is a basis for R3. 


For (b), we consider the following equation 


V = C1V1 + C2V2 + C3V3, 


[-3,5, —1] = cı[i, 1,2] T c2[0, 2, 1] F c3[2, 1,3]. 
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Equating corresponding components gives 


1 0 2 Cy —3 
1 2 1 ce | = 5 
21 3 c3 —1 
Solving this system, we obtain [c1, c2, c3] = [1, 3, —2]. Therefore, 
[v]s = [1,3,—2]. 


For (c), we obtain by using the definition of the coordinate vector [v]s, 


v = (-1)vi + 2v2 + v3 = [1, 4,3]. 


4.4.3 Dimension 


Definition A nonzero vector space V is called finite-dimensional if it contains 
a set of a finite number of vectors {v1, V2,..., Vn} that forms a basis. If no such set 
exists, V is called infinite-dimensional. In addition, the zero vector space is said 
to be finite-dimensional. 


Theorem 4.9 Let V be a finite-dimensional vector space and S = {v1,V2,.--,Vn} 
be any basis. Then 


(a) Every set with more than n vectors is linearly dependent. 
(b) No set with fewer than n vectors spans V. 


Proof For (a), let S’ = {w1, W2,..., Wm} be any set of m vectors in V, where 
m >n. Since S = {v1,V2,...,Vn} is a basis, each w; can be expressed as a linear 
combination of the vectors in S: 


Wi = 411V1 T G21V2 eee an1iVn 
W2 = Q12V1 T Q22V2 T t° T Gn2Vn 
(4.8) 
Wm = AmV1 + a2mV2 + te + GnmVn- 
To show that S’ is linearly dependent, we must find scalars kı, k2,...,km, not all 
zero, such that 
kıwı + kowe ara kmWm = 0. (4.9) 


Using (4.8), equation (4.9) can be rewritten as 


(kiaii + kody + +++ + kmaim)V1 + (kia21 + k2a22 +++: + kmd2m)V2 
Freie (kiani + koan2 a Ee Fe kim@nm)Vn = 0. 
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Since S is linearly independent, we have 


ayiky + aygke RSi Qimkm =0 


az1ıkı + az2k2 +++: + a2mkm = 0 


anikı + an2k2 AAi m anmkm = 0. 


Since there are more unknowns than equations (m > n), Theorem 1.1 guarantees the 
existence of nontrivial (nonzero) solutions, i.e., equation (4.9) has nonzero solutions. 
Thus, S’ is linearly dependent. 


The proof of (b) is left as an exercise. 


We immediately have the following corollary. 


Corollary 1 All bases for a finite-dimensional vector space have the same number 
of vectors. 


Definition The dimension of a finite-dimensional vector space V, denoted by 
dim(V), is defined to be the number of vectors in a basis for V. In addition, the 


dimension of zero vector space is defined to be zero. 
Corollary 2 /f dim(V) = n, then 
(a) Every set with more than n vectors is linearly dependent. 
(b) No set with fewer than n vectors spans V. 
Example 1 Dimensions of some vector spaces: 
dim(R”) = n, dim(P,) =n +1, dim(R™*") = mn. 


Example 2 Determine a basis for and the dimension of the solution space of the 
homogeneous system 


£1 + 32 + 43- T4 =0 


v4 £2 + 2x3 — 344 — z5 = 0 
v3 — 2%4 — z5 =0 


—2zı + La = 2&3 — z5 =Q. 


4.4 Basis and Dimension 93 


Solution The solution of the given homogeneous system is 


zı =—s-—t Ly —1 —1 
tw =0 T2 0 0 
zxz =s+2t => | r3 | =s 1 | +t 2 
£4 = z4 0 1 
%5=S8 T5 1 0 


which shows that {[—1,0,1,0, 1], [-1,0,2,1,0]"} is a basis for the solution space. 
Thus, the dimension of the solution space is 2. 


4.4.4 Some fundamental theorems 


The following theorems reveal the subtle relationships among the concepts of 
spanning sets, linear independence, basis, and dimension. 


Theorem 4.10 (Plus/Minus Theorem) Let S be a nonempty set of a finite number 


of vectors in a vector space V. 


(a) If S is linearly independent, and if v is in V but is outside of span( S), then 
the set S U {v} is still linearly independent. 


(b) Let v be in S and it can be expressed as a linear combination of other vectors 
in S. If S — {v} denotes the set obtained by removing v from S, then 


span(S) = span(S — {v}). 
Proof For (a), let S = {w1, w2,...,w,}. Then 
SU {v} = {v, wi, W2,..., Wr}. 
Consider the following equation 
kov + kıwı + kow2+---+kpw, = 0. (4.10) 


Then we must have kọ = 0. Otherwise v can be expressed as a linear combination 
of the vectors in S, i.e., v E€ span( S), which contradicts the fact that v ¢ span(S). 
Hence (4.10) simplifies to 


kıwı + k2W9 +. ++ kpWr = 0. 
Since S is linearly independent, we deduce 


ky = k2 =- =k, =0. 


94 Chapter 4 General Vector Spaces 


Thus, (4.10) only has the zero solution, i.e., S U {v} is still linearly independent. 
For (b), let S = {v, wi, wo,...,w,}. It is obvious that 
span(.S' — {v}) = span{w1, W2,..., Wr} C span{v, wi, w2,...,w,} = span(S). 
For any vector u € span( S), it can be expressed as 
uU = CoV +c, wy + Cow2 +°- + CrWr. (4.11) 


Since v € S and v can be expressed as a linear combination of other vectors in S, 


we have 
v=) djwj. (4.12) 
j=l 


We can replace v in (4.11) with (4.12) and then 
u= o( > dyw) + cyw, + cow2 +--+ + CrWr = X (cod; + cj) Wy. 
j=l j=l 
Therefore, 
u € span{w1, W2,...,w,} = span(S — {v}). 
Thus, 


span(S) = span(S — {v}). 


Theorem 4.11 Let V be a vector space with dim(V) = n and S be a set in V with 
exactly n vectors. Then S is a basis for V if either V = span(S) or S is linearly 
independent. 


Proof We first assume that V = span($) and S has exactly n vectors. To show 
that S is a basis, we must prove that S is linearly independent. By contradiction, 
we assume that S is linearly dependent. It follows from Theorem 4.5 (a) that at 
least one of the vectors in S, say v, can be expressed as a linear combination of the 
other vectors in S. Then we have by Theorem 4.10 (b), 


span(S — {v}) = span(S) = V. 
Since S — {v} contains n — 1 vectors only, it follows from Theorem 4.9 (b) and the 


given condition dim(V) = n that V can not be spanned by S— {v}. A contradiction! 
Thus, S should be linearly independent. 


We next assume that © is linearly independent. To show that S is a basis, we 
must prove that V = span(S). By contradiction, we assume that there is a vector 
w E€ V but w ¢ span(S). By Theorem 4.10 (a), SU{w} is still linearly independent. 
However, SU {w} has n + 1 vectors. It follows from Theorem 4.9 (a) and the given 
condition dim(V) = n that S U {w} should be linearly dependent. A contradiction! 
Thus, V = span(S). 
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Theorem 4.12 Let S be a set of a finite number of vectors in a finite-dimensional 


vector space V. 


(a) If S spans V but is not a basis for V, then S can be reduced to a basis for V 
by removing appropriate vectors from S. 


(b) If S is a linearly independent set that is not already a basis for V, then S can 


be enlarged to a basis for V by inserting appropriate vectors into S. 


Proof Note that V is a finite-dimensional vector space. Therefore, the following 
removing process and inserting process can be completed in finite steps. 


For (a), since V = span($), by removing appropriate vectors from S, it follows from 
Theorem 4.10 (b) that S can be reduced to a subset of S which forms a basis for V. 


For (b), since S$ is linearly independent, by inserting appropriate vectors into S, it 


follows from Theorem 4.10 (a) that S can be enlarged to a basis for V. 


Theorem 4.13 Let W be a subspace of a finite-dimensional vector space V. Then 
we have dim(W) < dim(V). Moreover, if dim(W) = dim(V), then W = V. 


The proof of Theorem 4.13 is left as an exercise. 


4.4.5 Dimension theorem for subspaces 


Theorem 4.14 (Dimension Theorem for Subspaces) Let Vı and V2 be two subspaces 
of a vector space V. Then 


dim(V,) + dim(V2) = dim(Vi + V2) + dim(V; N V2). 
Proof We assume that 
dim(V;) = nı, dim(V2) = na, dim(V; N V2) = m. 


We can choose a basis {X1,X2,...,Xm} for V1 O V2. By Theorem 4.12 (b), there exist 
nı — m vectors y1,Y2,---;¥n,—m such that the set 


{X1, X2, s++>%msY1,Y2>--- SVinewiey 


is a basis for V,. Similarly, by Theorem 4.12 (b) again, there exist nz — m vectors 
Z1,Z2,--+,Zn,—m such that the set 


{X1, X2, -3 Xm, Z1, Z22;... »Zno—m} 
is a basis for Vj. Since 


Vi = span{x1, X2, e213 Xm; Yy1;y2;- Vnemt 
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and 
V2 = span{xX1, X2, -3 Xm, Z1, Z2,... insem 


it follows from (4.1) that 
Vi + Vo = span{x}, X2, -++>%m,¥1,Y2,--- »>¥ni—m) Z1, 22, bated reese 


Next, we want to show that {Xx1, X2, . .., Xm, Y1; Y2; -< <, Ynı-m Z1; Z2; -< -;, Zno-m} 
is linearly independent. Consider the following equation 


kıxı +- ‘ -+kmXm +piyit: i -+HPni-mYnı-m t q1Z1 +: 7 *+dno—mZn2—m =0. (4.13) 


Let 
v = kX +e t kmXm + pry1 +e + Pry-mYni—m E Vi. (4.14) 


It follows from (4.13) that 
V = qZ — ++ — qna-mZnə-m E Vo. (4.15) 
Therefore, v € Vi N Vz and v can also be expressed as 
v =x, + lX +++: + lmXm. (4.16) 
Combining (4.15) and (4.16), we obtain 
lxi + loxe +++ + UmXm tur t + dnyg—mZnz—m = 9. 


By the fact that {x,,x2,...,;Xm,Z1,Z2,---,Zn.—m} is a linearly independent set, we 
have 


ly eee lin gi eee dno—m [i 0. 
Then v = 0. Furthermore, (4.14) becomes 


0= kıXı free ke. + piyi test Pny—mYn,—m: (4.17) 


Since {x1,X2,---,Xm,¥1,Y2,---,¥n,—m} is a linearly independent set, it follows 
from equation (4.17) that 


kı =- = km = pı =` = Pn-m = 0. 
Therefore, from (4.13) again, we know that 
TXIRI ata Xm V1, Ya es ¥ny—my Zl; 2ye Znamy 
is linearly independent and it forms a basis for Vı + V2. Hence 


dim(V; + V2) = nı + ng — m = dim(V,) + dim(V2) — dim(V, N V3). 


The result holds. 
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4.5 Row Space, Column Space, and Nullspace 


In this section, we study three important vector spaces that are associated with 
matrices. 


4.5.1 Definition of row space, column space, and nullspace 
We first introduce the following definition of row vectors and column vectors. 


Definition For anm x n matrix 


a&i Q12 Gin 
a21 Q22 Gan 
A= i 
Ami Am2 Amn 
the vectors 
rı = [a11, a12, ee’ ain], r3 = [a21, a22, sey aan], seep Pm = [Am1,4m2, sey amn] 
(4.18) 
in R” are called the row vectors of A, and the vectors 
a11 a12 Qin 
a21 a22 a2n 
cy = , = i pels “Ge y (4.19) 
aml Am2 amn 
in R™ are called the column vectors of A. 
Definition Let A be an m x n matrix. Then 
(i) row space of A := span{r1,r2,..., £m}, 
(ii) column space of A := span{c1,C2,...,€n}, 
(iii) nullspace of A := solution space of Ax = 0, 
where Y1,Y2,...,%m are the row vectors given in (4.18), and c1,C2,...,Cn are the 


column vectors given in (4.19). 


Theorem 4.15 A system of linear equations Ax = b is consistent if and only if b 
is in the column space of A. 
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Proof Let 


T 
A=[a I C2 eee Cn ’ X = |£1, £2,..., 2n] 


pi 


where ¢1,C2,...,€, are the column vectors of A. Then by using (1.5), the linear 
system Ax = b is consistent if and only if 


L1C1 + T202 +++: + EnEn = b 


has solutions, which means that b can be written as a linear combination of 
C1,C2,..-,C€n, Le., 


b € span{cj,€2,...,€n}. 


4.5.2 Relation between solutions of Ax = 0 and Ax = b 


Theorem 4.16 Let xo be any single solution of a consistent linear system Ax = b, 
where b is a nonzero vector. If vi, V2,...,Vk form a basis for the nullspace of A, 


then every solution of Ax = b can be written as the following form 
X = Xo + C1 V1 + C2V2 +--+ + CEVE. (4.20) 


Conversely, for all choices of scalars c1, c2,...,Cp, the vector x in (4.20) is a solution 


of Ax =b. 
Proof Let y be any other solution of Ax = b, i.e., 
Ay =b. 
We already knew that Axọ = b. Therefore, 
A(y — xo) = Ay — Axo =b -b = 0, 


which implies that y — xo is a solution of Ax = 0, i.e., y — Xo is in the nullspace of 
A. Since v1, V2,...,Vẹ form a basis for the nullspace of A, we have 


y — Xo € span{v1, V2,..., Vk}. 


It follows that 


Y — Xo = C1V1 + C2V2 + +++ + CkVk, 


where c1,C2,...,Ck are scalars. Thus, 
y = Xo + C1 V1 + C2V2 + ++- + CkVk. 
Conversely, for any choices of scalars c1, C2,...,Cķg, we can construct a vector as 


Z = Xo + C1Vi + C2V2 + +++ + CkVk. 
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Multiplying both sides by A yields 


Az = Axy + c1 Av, + Cop Av +: + ckAvk =b+0+4+---+0=b. 


Therefore, z is a solution of Ax = b. 


Remark There is some terminology associated with (4.20). The vector xo is called 
a particular solution of Ax = b. The expression 


Xo + C1V1 + C2V2 + +- + CkVk 
is called the general solution of Ax = b, and the expression 
C1V1 + C2V2 + +++ F CkVk 
is called the general solution of Ax = 0. 


Example Solve the linear system 


Ti — 2X2 -z 323 iati 325 = —4 
£1 4 2X9 t 4x3 T4 das 2X6 = 2 
(4.21) 
323 or 324 T 325 +r t= 1 
204 = 4x2 5 6x4 w 326 = —5 
and obtain 
zı = 2r —3s— 4, T2 =T, t3=>-s—t, zt4=8, zt5=t, zęş=l. 
This result can be written in vector form as 
Ly 2r — 3s — 4 —4 2 —3 0 
T2 r 0 1 0 0 
£3 —s-—t 0 0 —1 —1 
A s o arjo T] a gl ee 
£5 t 0 0 0 1 
£6 1 1 0 0 0 
Sev S aama 
Xo y: 


which is the general solution of (4.21). The vector xo in (4.22) is a particular solution 
of (4.21) and the linear combination y in (4.22) is the general solution of 


vy — 2x2 = 323 =- 325 =0 


=i + 2X2 + 4x3 +r 4+ 425 = 2X6 =0 


323 324 3X5 + we = 0 


224 am! 429 She 6x4 + 326 = 0. 
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Remark In fact, for a consistent linear system Ax = b, the number of free 
variables is equal to the number of parameters in the general solution of the system. 
Thus, the number of those parameters is equal to the number of vectors in a basis 
for the nullspace of A. 


4.5.3 Bases for three spaces 


It is well-known that any elementary row operation does not change the solution set 
of linear system Ax = 0. Thus, we have the following theorem. 


Theorem 4.17 Elementary row operations do not change the nullspace of a matrix 


A. 
Moreover, the following theorem is concerned with the row space of a matrix A. 


Theorem 4.18 Elementary row operations do not change the row space of a matrix 


A. 


Proof Assume that B is a matrix obtained from A by implementing an elementary 
row operation on A. Let 


row space of A = span{r,,fo,...,Tn}, row space of B = span{r},r5,...,27, 


Ema 
where {r1,r2,...,rn} is the set of row vectors of A and {r{,r5,...,r/,} is the set of 
row vectors of B. We want to show that 

span{r1,r2,..., rn} =span{r),rs,...,r/,}. 
For any r; € B, corresponding to three kinds of elementary row operations performed 


on A, we consider the following three cases: 


r= tj; where r; is the jth row vector of A; 


Vr, = cri, where c is a nonzero scalar and r; is the ith row vector of A; 


r; =r; + krj, where k is a scalar and r, is the pth row vector of A for p = i, j. 


We then have for all 2, 
r; € span{r1,r2,..., En}. 


Therefore, 
span{r1,r2,..., rn} D span{r},r9,...,r,}- (4.23) 


Since A can be obtained from B by performing inverse elementary row operations 
on B, one can show that similarly 


span{r1,re,...,%n} C span{r},r9,...,r,}- (4.24) 
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It follows from (4.23) and (4.24) that 


span{r1,r2,..., rn} =span{r),rs,...,r/,}. 


Remark However, elementary row operations can change the column space of a 
matrix A. For instance, consider 


A= 1S . 
2 8 
If we add —2 times the first row of A to the second row, we obtain 
B= ee : 
0 0 


Note that 
column space of A = span{{1, 2]7} 4 span{[1,0]”} = column space of B. 


Although elementary row operations can change the column space of a matrix, 
whatever relationships of linear independence or linear dependence that exist among 
the column vectors of a matrix prior to a row operation will keep holding for the 
corresponding columns of the matrix that results from that row operation. More 
precisely, we have the following result. 


Theorem 4.19 Let E be any elementary matrix. Then a given set of column vectors 
of A is linearly independent if and only if the corresponding column vectors of EA 
are linearly independent. 


Proof Let E be any elementary matrix and 


A=| a Mog denii cn |, 
where c1, C2,...,Cn are the column vectors of A. Then 
EA=[ Ec, | Eca | © | Een |. 


Without loss of generality, we consider the following equations: 


5 kici =0 and y k;Ec; = 0, 
i=1 i=1 


where r < n. In fact, 
i=l i=1 i=1 


Thus, the given set of column vectors of A is linearly independent if and only if the 


set of corresponding column vectors of EA is linearly independent. 
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Remark In fact, (4.25) implies an even deeper result that whatever linear 
combinations that exist among the column vector of A keep holding for the 
corresponding column vectors of EA. 


Theorem 4.20 Let R be a matrix in row-echelon form. Then the row vectors with 
the leading 1’s form a basis for the row space of R, and the column vectors with the 
leading 1’s of the row vectors form a basis for the column space of R. 


The result of Theorem 4.20 is virtually self-evident and the proof of the theorem is 
left as an exercise. 


Remark Theorem 4.20 makes it possible to find bases for the row and column 
spaces of a matrix in row-echelon form by inspection. 


4.5.4 A procedure for finding a basis for span(S) 


Let S = {v1, V2,..., Vk} C R”. Then by the following procedure, one can find a 
basis for span( S) and simultaneously express the vectors in S as a linear combination 
of the basis vectors. 


(1) Form the matrix A having vj, v2,..., Vx as its column vectors. 


(2) Reduce A to its reduced row-echelon form R, and let w1, W2,..., Wx be the 
column vectors of R. 


(3) Identify the columns that contain the leading 1’s in R. The corresponding 
column vectors of A are the basis vectors for span(S). 


(4) Express each column vector w; of R that does not contain a leading 1 as a 
linear combination of preceding column vectors that do contain leading 1’s. 


(5) In each linear combination obtained in (4), replace w; with vj for j = 
1,2,...,k. 
Example Let vı = [2,—1,1,0], vo = [—4,2,—2,0], v3 = [1,0,-2,1], va = 


(0, 7, —2, 3], and vs = [3, 5, 2, 2]. 


(a) Find a subset of {v1, v2, V3, V4, V5} that forms a basis for span{v1, V2, V3, V4, 


vs}. 


(b) Express each vector not in the basis as a linear combination of the basis vectors. 
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Solution For (a), we begin by constructing a matrix A that has v1, v2, v3, V4, and 
vs as its column vectors: 


l 

H 

N 

S 

N 
mw Ww ow 


We reduce the matrix A to its reduced row-echelon form R and denote the column 
vectors of the resulting matrix by w1, W2, W3, W4, and ws. We yield 


1 -2 0 0 2 

0 0 1 0 -1 
R= 

0 0 0 1 1 

0 0 0 0 0 

ae St oh 


Wi W2 W3 W, W5 
The leading 1’s occur in columns 1, 3, and 4. It follows from Theorem 4.20 that 
{w1, w3, w4} forms a basis for the column space of R. Consequently, {v1, v3, v4} is 
a basis for the column space of A by Theorem 4.19. 


For (b), we have the following linear combinations by inspection of R, 
W2 = —2w1, W5 = 2w; — W3 + wa. 


The corresponding relationships in A are 


V2 = —2vi, V5 = 2vı — V3 + V4. 
4.6 Rank and Nullity 


For a given matrix A, we have the following four fundamental matrix spaces: 
(1) row space of A; 


2) column space of A; 


(2) 
(3) nullspace of A; 

(4) nullspace of AT. 

In this section, we are concerned with relationships between the dimensions of these 
four vector spaces. The results obtained here are fundamental and will provide a 
deeper insight into the relationship between a linear system and its coefficient matrix. 
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4.6.1 Rank and nullity 


Theorem 4.21 Let A be any matrix. Then the row space and column space of A 


have the same dimension. 


Proof Let R be the reduced row-echelon form of A. It follows from Theorems 4.18 
and 4.20 that 


dim(row space of A) = dim(row space of R) = number of leading 1’s (4.26) 
and it follows from Theorems 4.19 and 4.20 that 


dim(column space of A) = dim(column space of R) = number of leading 1’s. 
(4.27) 
Thus, we have by (4.26) and (4.27), 


dim(row space of A) = dim(column space of A). 


Definition The common dimension of the row space and column space of a matrix 
A is called the rank of A and is denoted by rank(A). The dimension of the nullspace 
of A is called the nullity of A and is denoted by nullity(A). 


Theorem 4.22 Let A be any matriz. Then rank(A) = rank(A’). 


Proof We have 


rank(A) = dim(row space of A) = dim(column space of AT) = rank( A”). 


Theorem 4.23 (Dimension Theorem for Matrices) Let A be a matrix with n 
columns. Then 


rank(A) + nullity(A) = n. 


Proof Since A has n columns, the homogeneous linear system Ax =0 has n 
variables. These variables fall into two categories: the leading variables and the 
free variables. Then 


variables variables 


| number of leading | | number of free | 
— n, 
[number of leading 1’s] + [number of free variables] = n. 


Thus, 
rank(A) + [number of free variables] = n. 
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We recall that the number of free variables is equal to the nullity of A. This is so 
because the nullity of A is the dimension of the solution space of Ax = 0, which is 
the same as the number of parameters in the general solution, which is the same as 
the number of free variables. Thus, 


rank(A) + nullity(A) = n. 
Example Find the rank and nullity of the matrix 
2 -8 1 3-4 


Solution Consider solving the linear system Ax = 0. The reduced row-echelon 
form of A is 


1 -4 0 5 -3 
0 0 1 -7 2 
0 0 0 0 0 
0 0 0 0 0 


Since there are two nonzero rows (or equivalently, two leading 1’s), the row space 
and column space are both two-dimensional, i.e., rank(A) = 2. The corresponding 
system is 
xı —42x9 +524 —3245 = 0 £ı = 4z — 5x44 + 325 
= 
£3 —Tr4 +225 = 0 £3 = 7X4 — 2X5. 
It follows that the general solution of Ax = 0 is 
xı = 4r — 5s + 3t 


T2 Sr 
z3 = 7s — 2t 
T4 =S 
T5 = t 
or equivalently, 
Tı 4 —5 3 
T2 1 0 0 
z3 |=r| 0 | +s 7| +t] -2 |. (4.28) 
T4 0 1 0 
T5 0 0 1 


The three vectors on the right-hand side of (4.28) form a basis for the solution space 
of Ax = 0. Therefore, nullity(A) = 3. 
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Remark Let A be an m xn matrix and rank(A) = r. Then rank(A) < min{m,n} 
and we have the following table relating the dimensions of the four fundamental 


spaces of A. 
Fundamental Space | Dimension 
Row space of A r 
Column space of A r 
Nullspace of A n-T 
Nullspace of AT m-r 


4.6.2 Rank for matrix operations 


Theorem 4.24 For any n x n matrices A and B, we have 
(a) rank(A + B) < rank(A) + rank(B). 
(b) rank(AB) < min{rank(A), rank(B)}. 
(c) rank(PAQ) = rank(A), where P and Q are invertible matrices. 


Proof We only prove (b). The proofs of (a) and (c) are left as an exercise. Let 


shalate BE bi biet ba |, 
where ax and by (1 < k < n) are the column vectors of A and B, respectively. Let 
C1, C2, ..., Cn denote the column vectors of AB. Then 

AB=| cı | c2 |+ | en | =A] bi | be | = | ba | 


| Aba | = | Abn | 


Thus, 


span{c1, C2, efx Gray C span{aj, a2, sa% an}. 


We therefore have 


rank(AB) = dim(span{c1,c2,...,¢n}) < dim(span{a1, a2, ...,an}) = rank(A). 
(4.29) 
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Moreover, it follows by Theorem 4.22 and (4.29) that 
rank(AB) = rank((AB)") = rank(B? A’) < rank(BT) = rank( B). (4.30) 
Combining (4.29) and (4.30), we obtain 


rank(AB) < min{rank(A), rank(B)}. 
Example For any square matrices A and B of the same size, show that 
rank(I — AB) < rank(J — A) + rank(I — B), 
where I is the identity matrix. 
Proof We have by Theorem 4.24 (a), 
rank(I — AB) = rank(I — A+ A — AB) < rank(J — A) + rank( A — AB). 
Moreover, it follows from Theorem 4.24 (b) that 
rank(A — AB) = rank(A(I — B)) < min{rank(A), rank(J — B)} < rank(I — B). 


Thus, the proof is completed. 


4.6.3 Consistency theorems 


The following theorem guarantees a linear system to be consistent. 


Theorem 4.25 Let Ax = b be a linear system of m equations inn unknowns. Then 
the following are equivalent. 


(a) Ax =b is consistent. 

(b) b is in the column space of A. 

(c) rank(A) = rank([ A | b ]), where | A | b ] is the augmented matric. 
Proof (a) = (b): See Theorem 4.15. 


(b) = (c): Let 


AS Ver | eat een | 
where ¢1,€9,...,€, are the column vectors of A. We have by Theorem 4.10 (b), 
n 
b € span{c1,€2,...,€n} 4> b= DY kici 
i=1 
<=> span{c1, C2, . . . , Cn} = span{ci,c2,...,€n, b} 
<=> dim(span{c1, €2,.. . , Cn }) = dim(span{c1, €2,..., Cn, b}) 


<=> rank(A) = rank([ A ! b J), 


where [A}b]=| c Regi iee See Py 
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Corollary Let Ax =b be a linear system of m equations in n unknowns. Then 
Ax =b has a unique solution if and only if rank(A) = rank([ A | b ]) =n. 


The proof is left as an exercise (see Exercise 4.26). 


The following theorem guarantees a linear system to be consistent for any possible 
choices of b. 


Theorem 4.26 Let Ax =b be a linear system of m equations inn unknowns. Then 
the following are equivalent. 


(a) Ax =b is consistent for every m x 1 matriz b. 
(b) The column vectors of A span R™. 
(c) rank(A) = m. 


Proof Let 
A=| c | c ORE E TA 


where c1, C2,...,Cn are the column vectors of A. 


(a) => (b): We want to show that 


span{c,C2,...,C,} = R”. 
Since every c; € R™ for j = 1,2,...,n, we have 
span{c1, €2,..., Cn} CR”. 


On the other hand, it follows from (a) and Theorem 4.15 that for every b € R”, 
b € span{c1,C€2,...,Cn}. 

Thus, 

span{c,,C2,...,C,} = R”. 
(b) = (c): We have 

rank(A) = dim(span{cj,¢2,...,¢,}) = dim(R™) = m. 

(c) = (a): Since 

span{c),C2,...,Cn} CR”, 


and also 
dim(span{c,,C2,...,€,}) = rank(A) = m = dim(R™), 


we obtain by Theorem 4.13, 


span{c1, €2,..., Cn} = R”. 


It follows from Theorem 4.15 again that Ax = b is consistent for any b € R”. 
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Theorem 4.27 Let A be an mx n matriz. Then the following are equivalent. 
(a) Ax = 0 has only the trivial solution. 
(b) The column vectors of A are linearly independent. 
(c) Ax =b has at most one solution (none or one) for every m x 1 matris b. 
The proof of the theorem is left as an exercise. 
4.6.4 Summary 


Theorem 4.28 Let A be ann x n matrix and T4: R” —> R” be multiplication by 
A. Then the following are equivalent. 


Ax = 0 has only the trivial solution. 

The reduced row-echelon form of A is In. 

A is expressible as a product of elementary matrices. 

Ax = b is consistent for every n x 1 matriz b. 

Ax = b has exactly one solution for every n x 1 matriz b. 
det(A) # 0. 


The range of T4 is R”. 


The column vectors of A are linearly independent. 
The row vectors of A are linearly independent. 
The column vectors of A span R”. 

The row vectors of A span R”. 

The column vectors of A form a basis for R”. 
The row vectors of A form a basis for R”. 


) 

) 

) 

) 

) 

) 

) 

) 

(i) T4 is one-to-one. 

) 

) 

) 

) 

) 

) 

) rank(A) =n. 
) 
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Exercises 


Elementary exercises 
4.1 Prove Theorem 4.1 (b), (c), and (d). 


4.2 Let W and U be two subspaces of a vector space V. Show that W N U and 
W +U are subspaces of V. 


4.3 Use Theorem 4.2 to determine which of the following are subspaces. 
(a) The set of all polynomials aj +a12 +4227? +a32x° for which ag+a1+a2+a3 = 0. 


(b) The set of all polynomials ag + a,x + azx? + azz? for which ag, a1, a2, and a3 
are integers. 


(c) The set of all polynomials ag + a,x + agx? + agx? for which a, x a3 = 0. 
(d) The set of all vectors in R with the first coordinate component nonzero. 
(e) The set of all diagonal matrices in R”*”. 


(£) The set of all vectors x in R” such that Ax = b, where A € R"*” and b £ 0. 


(g) The set of all differentiable functions f = f(x) in F(—oo, +00) that satisfy 


df (x) 


=0. 
dx 
4.4 Express the following vectors as linear combinations of u = [2,1,4], v = 
[1,-1, 3], and w = [3, 2,5]. 
(a) [-9, -7, —15]. (b) [1,0,3]. 


4.5 Determine whether the vector v = [0,5,6,—3] is contained in the subspace 
spanned by uj, u2, ug, and uy, where 


u = [-1,3, 2, 0], uz = [2, 0,4, —1], u3 = [7,1,1,4], u, = [6,3, 1, 2]. 


4.6 Let Aq), A(z), A(s), and B be matrices in R?**, where 


1 2 0 1 2 0 1 5 
A m A => B= . 
pal (2) ei (3) l ae |; i 


Determine whether B is contained in span{A(1), A(2), A(3) }- 


Aa) = 
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4.7 Let W be the set of all vectors of each given form, where a, b, and c represent 
arbitrary real numbers. Determine whether W is a subspace of R4. If so, find a set 
S of vectors that spans W. 


(a) [2a + 3b, —1, 2a — 5b, 5a]. (b) [2a — b, 3b — c, 3c — a, 3b]. 
4.8 In each part, determine whether the given vectors span R3. 
(a) vı = [2,-1,3], ve = [4,1,2], v3 = [8, —1, 8]. 
(b) vi = [1,2,6], v2 = [8,4, 1], vs = [4,3, 1], v4 = [3,3,1]. 
4.9 Prove Theorem 4.4. 
4.10 Let u, v, and w be linearly independent vectors in R”. Show that 
(a) {u,u+v,u+v-+ w} is linearly independent. 
(b) {u+-v,u+w,v-+ w} is linearly independent. 
(c) {u — v, u — w, v — w} is linearly dependent. 
4.11 Determine whether each set of vectors is a basis for the given vector space. 
(a) u; = [2,1,3], u2 = [1,1,0], us = [2,0,0] for R3. 
(b) uz = [2, —3, 1], ue = [4,1,1], us = [0, —7, 1] for R3. 
(c) pı =2+27, po = 1 +x, p3 = 3 + 2x + x° for Po. 


0 1 


4.12 Find the coordinate vector of w relative to the basis S = {u1, ug} for R?. 


1 2 
d) Aq) = 
( ) (1) 2 0 


2 0 
, A(z) S | 0 3 | for R2*?, 


(a) uy = [1,0], u2 = [0,1], w = [3, —7]. 
(b) u: = [1,1], ue = [0,2], w = [a, b]. 
4.13 Let {uj,ue,...,u,} be a basis for a vector space V, where n > 2. 
(a) Show that the set {u1, u1 + u2, ..., U1 + U2 +--+: + Un} is also a basis for V. 
(b) Is the set {u1 + ug, U2 + U3, ..., Un—1 + Un, Un + u1} a basis for V? 


4.14 Prove Theorem 4.9 (b). 
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4.15 Prove Theorem 4.13. 
4.16 Prove Theorem 4.20. 
4.17 Let S = {vi, V2, V3, V4, V5, Ve}, where 
vı = [2, 1,0, —2], Vo = [4, 2,0, —4], v3 = [0, —2, 5,5], 
v4 = [8, 0, 10, 2], vs = [6,3,0, —6], ve = [18,0, 15, 3]. 
(a) Find a subset of S that forms a basis for the space spanned by these vectors. 


(b) Express each vector not in the basis as a linear combination of these basis 
vectors. 


4.18 Let A,B,C €R"*”. Show that 


(a) rank(AB) = rank(B) if and only if the systems (AB)x = 0 and Bx = 0 have 
the same solutions. 


(b) rank(ABC) = rank(BC) if rank(AB) = rank(B). 
4.19 Let A € R”*” and B € R”"*?. If AB = 0, show that 
rank(A) + rank(B) < n. 
4.20 Prove Theorem 4.24 (a) and (c). 
4.21 Let A € R"*”. Show that A? = A if and only if 
rank(A) + rank(A — I) =n. 
4.22 Let A,B eR”*”. Show that 


max{rank(A), rank(B)} < rank (| A|B h < rank(A) + rank(B). 


) 


4.23 Let A and B be any matrices. Show that 


A 0 


k(A k(B) = rank 
rank(A) + rank(B) ak ( i 


4.24 Let A € R”*” with rank(A) = 1. 
(a) Show that A can be expressed as the following form 
ay 


a2 


A= , (by, bo,..., dy]. 
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(b) Show that A? = kA, where k is a scalar. 


4.25 How does the rank of A vary with t? 


eo 


1 
1 
1 
t 


eRe RP Ss 
ao 


4.26 Let Ax = b be a linear system of m equations in n unknowns. Show that 
Ax = b has a unique solution if and only if rank(A) = rank([{ A | b ]) =n. 


4.27 Prove Theorem 4.27. 

Challenge exercises 

4.28 Let Wı and W2 be subspaces of a vector space V. 
(a) Show that W1 N W2 C W1 U Wa CW, + Wo. 
(b) When is W1 U W2 a subspace of V? 


(c) Show that if U is a subspace of V containing Wı U W2, then Wı + W2 C U. 


4.29 Let W be the subspace of all n x n symmetric matrices in R”*”. Find a basis 
for and the dimension of W. 


4.30 Let u1, u2,..., Ug be vectors in R”. Determine whether the following 
statements are true or not. If true, prove it. Otherwise, give a counterexample. 


(a) If u1,u2,..., U% are linearly independent, then u; and uj are linearly indepen- 
dent for each pair of i, j, where 1 < i,j < k and i Æ j. 


u; and u; are linearly independent for each pair of i,j, where 1 < i,j < 
b) If d uj li ly ind dent f h pair of i,j, where 1 < i,j < k 
and i Æ j, then {uj, ue,..., Uk} is linearly independent. 


4.31 Let P, be the set that consists of all real polynomials of degree n or less. 
(a) Show that {1,1 + x, 1 +æ + 27} is a basis for Po. 


(b) Let W = { p(x) | p(—x) = p(x), p(x) € Pa}. Show that W is a subspace of 
P,, and find a basis for W. 


4.32 Let A € R?*?. Show that if A? = I but A # +I, then 


rank(A + I) = rank(A— J) = 1. 
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4.33 Let A and B be any given matrices. Show that 


rank(A) + rank(B) < rank ( 


where C is an arbitrary matrix. 


4.34 Let A,B € R”*”. Show that if ABAB = I, then 
rank(I + AB) + rank(J — AB) =n. 
4.35 Let A € R”*”. Show that 
(a) rank(adj(A)) = n if rank(A) = n. 
(b) rank(adj(A)) = 1 if rank(A) =n — 1. 
(c) rank(adj(A)) = 0 if rank(A) < n — 1. 
4.36 Let A,B € R”*”. Show that 
rank(AB) > rank(A) + rank(B) — n. 
4.37 Let Aq), A),---, Aq) E R°*”. Show that if Ac) Aq) A) = 0, then 


rank(A(1)) + rank(A(2)) +e rank( A(x) ) S (k-l1)n. 


Chapter 5 


Inner Product Spaces 


“Inner product gives a structure to vector space which allows mathematician to build 
geometry out of bare manifold.” 


— Shing-Tung Yau 


We introduced the Euclidean inner product on R” in Chapter 3. In this chapter, 
we extend the concept of the Euclidean inner product to general vector spaces. We 
extract the most important properties of the Euclidean inner product on R” and turn 
them into axioms that are applicable in general vector spaces. Then, it is reasonable 
to use these generalized inner products to define notions of length, distance, and 
angle in general vector spaces. 


5.1 Inner Products 


In this section we use the most important properties of the Euclidean inner product 
as axioms to define the general concept of an inner product. We then explain how an 
inner product defines notions of length and distance in general vector spaces other 
than R”. 


5.1.1 General inner products 


The fundamental properties of the Euclidean inner product on R” that were listed 
in Theorem 3.2 are precisely the axioms in the following definition. 


Definition An inner product on a real vector space V is a function that associates 
a real number with each pair of vectors u and v in V, denoted by (u,v), in such a 


way that the following axioms are satisfied for all vectors u, v, and w in V and all 


scalars k. 
(i) (u,v) = (v, u). [Symmetry axiom] 
(ii) (u + v, w) = (u, w) + (v,w). [Additivity axiom] 


(iii) (ku, v) = k(u, v). [Homogeneity axiom] 
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(iv) (v,v) > 0; (v,v) =0 if and only if v =0. [Positivity axiom] 

A real vector space with an inner product is called a real inner product space. 

Definition Let V be a real inner product space. Then the norm (or length) of a 


vector u in V is defined by 
1/2 


lul] := (u, u) 
The distance between two vectors u and v in V is defined by 
d(u, v) := ||u — v|]. 
The unit vector is defined to be a vector u with |u|] = 1. 


The following theorem lists some properties of inner products. 


Theorem 5.1 Letu, v, and w be vectors in a real inner product space V, and k 
be any scalar. Then 


(a) (0, v) = (v,0) = 0. 

(b) (u,v +w) = (u, v) + (u, w). 
(c) (u, kv) = k(u, v). 

(d) (u—v,w) = (u, w) — (v, w). 
(e) (u,v — w) = (u,v) — (u, w). 


Proof We only prove (a). The proofs of remaining parts are trivial and we therefore 
omit them. We have by Theorem 4.1 (a) and Axiom (iii) [Homogeneity axiom], 


(0, v) = (0u, v) = 0- (u, v) = 0. 


5.1.2 Examples 


(1) Let u = [u1, u2,..., Un]? and v = [v1, v2,..-, Un]? bein R”. Then the formula 


n 


(u,v) := u: v= X uini =u'v 
i=1 


defines (u,v) to be the Euclidean inner product on R”. 


2) Let u,v € R”, and A be an invertible n x n matrix. It can be shown that the 
(2) ; ; 
formula 
(u, v) 4 := (Au, Av) = (Au)? Av = uT AT Av 


5.1 


Ww 
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defines a new inner product on R”, where (-,-) is the Euclidean inner product. 
When A = J, (u,v), is turned back to the Euclidean inner product. In the 
following, we only show that it satisfies Axiom (ii) [Additivity axiom] and 
Axiom (iv) [Positivity axiom]. One can verify that it also satisfies Axiom (i) 
[Symmetry axiom] and Axiom (iii) [Homogeneity axiom]. 


For Axiom (ii), we have 
(u +v, w)4 = (u + v)” AT Aw =u! AT Aw + vT AT Aw = (u,w) 4+ (v,w) a. 
For Axiom (iv), we have 
(u, u) 4 = (Au, Au) = uT AT Au = y’y > 0, 
where y = Au. When (u, u) 4 = 0, it follows that 
0=y = Au. 
Since A is invertible, we obtain u = 0. 


Let C[a,b] denote the vector space of all continuous functions on [a,b] with 
the following operations of function addition and scalar multiplication 


(f+ g)(x) = f(x) + g(2),  (kf)(2) = kf (2), 


where f = f(x), g = g(x) € C[a,b] and k is a scalar. Define 


b 
(fg) = J fa)g(a)de 


We show that this formula defines an inner product on C[a, b] by verifying four 
axioms one by one for functions f = f(x), g = g(x), and s = s(x) in C[a, b]. 


For Axiom (i), we have 


b 
TE i. f(a)g(a)de = f a(x) f(a)dx = (g, f}. 
For Axiom (ii), we have 
b b b 
Tess / (F(x) + 9(x)]s(a)de = J f(a) s(x) + J g(æ)s(a)de 
= (f,s) + (g,s). 


For Axiom (iii), we have for all scalar k, 


(te) = f f(a daladae =k f f(a = klf, g). 
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Finally, for Axiom (iv), if £ = f(x) is any function in Cfa, b], then f?(x) > 0 
for all x in [a,b]. Therefore, 


b 
(f,f) = / f?(z)dzx > 0. 
Further, because f?(a) > 0 and f = f(x) is continuous on fa, b], it follows that 


b 
J Pode =0 es) Hale) we lat. 


Therefore, 


(f,£) =0 f=0. 


(4) Let R”*” denote the vector space of all n x n real matrices. An inner product 
on R”*” is defined by 
(X,Y) := tr(XYT), 


where X,Y € R”*”. We recall that the trace of a matrix A = [a;;] € R”*” is 


given by 
n 
i=1 


In the following, we only show that it satisfies Axioms (ii) and (iv). One can 
verify that it also satisfies Axioms (i) and (iii). 


For Axiom (ii), we have by Theorem 1.3 (c), 
(X +Y, Z) =tr((X +Y)Z") = tr(XZ7 + YZ") = tr(XZ") + tr(Y Z7) 
= (X, Z) + (Y, 2), 
where X,Y, Z € R”*". 
For Axiom (iv), let X = [x;;] € R”*”. Then 


(X, X) = tr(XXT) =S S > 


i=l g=1 


Thus, 


(X, X) 0 Tij 0, CLAI N, 


i.e., X is the zero matrix. 


Remark The Frobenius norm of an n x n matrix A = [a;;] is defined by 


lAllr := (4, A)? = [tr(AAT)}/? = (Sb ae 


w=1 7=1 
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5.2 Angle and Orthogonality 


In this section we define the notion of an angle between two nonzero vectors in 
an inner product space. With this concept, we study some basic relations between 
vectors in an inner product space. 


5.2.1 Angle between two vectors and orthogonality 


We first introduce the Cauchy-Schwarz inequality before we define an angle 
between two vectors in general inner product spaces. The proof of the theorem is 
left as an exercise. 


Theorem 5.2 (Cauchy-Schwarz Inequality) Let u and v be two vectors in a real 
inner product space V. Then 


(u,v) < lull- Ivl]. 


In R”, by using the notation of the Euclidean inner product, the cosine of an 
angle 0 between two nonzero vectors u and v is defined by (3.3). We are now going to 
define the notion of an angle between two nonzero vectors in a general inner product 
space V. For any nonzero vectors u and v in V, by using the Cauchy-Schwarz 
inequality, we deduce 


<1. (5.1) 


Thus, we can define the cosine of the unique angle 0 between two nonzero vectors 
u and v in V by (5.1) as follows: 


(u, v) 


cos 6 = OSST. (5.2) 


lall- Iiv 
Observe that in R” with the Euclidean inner product, (5.2) agrees with (3.3). 


In R”, two nonzero vectors u and v are orthogonal if u - v = 0, i.e., the angle 0 
between them is 7/2. It follows from (5.2) that cos 0 = 0 if and only if (u,v) = 0. 
This suggests the following definition in a general inner product space. 


Definition Two vectors u and v in an inner product space V are called orthogonal 
if 
(u,v) = 0. 


Example 1 Let P, have the inner product 


(p,q) == I Peale 
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for p,q € P2. Then the polynomials p = 22 and q = 32? are orthogonal, since 


w.a) = f pla)a(e)de = | 6x3de = 0. 


Example 2 Let R?*? have the inner product (U,V) = tr(UV") for U,V € R?*?. 
Then the matrices 


are orthogonal, since 
(U,V) =1 x (-2)+3x14+2x04(-1)x1=0. 


5.2.2 Properties of length, distance, and orthogonality 


The following two theorems list some basic properties of length and distance in 
general inner product spaces. 


Theorem 5.3 Let u and v be vectors in an inner product space V, and k be any 
scalar. Then 


a) |lul| > 0. 
b) |u|] = 0 if and only if u = 0. 


(c) Iku] = [A] - ful]. 


(da) |u + vl] < |lul] + ||v||-- (Triangle inequality) 


Proof We only prove (d) and the proofs of remaining parts are trivial. We have by 
the Cauchy-Schwarz inequality, 
lu + vl? = (u +v,u +v) = (u, u) + 2(u, v) + (v, v) 
< llull? + 2Ilull - vll + lvl? = (all + Iv)? 


Thus, (d) holds. 


Theorem 5.4 Letu, v, and w be vectors in an inner product space V, and k be 


any scalar. Then 
(a) d(u,v) = ||u — v| > 0. 


(b) d(u,v) = 0 if and only if u = v. 
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(c) d(u,v) = d(v, u). 
(d) d(u, v) < d(u,w)+d(w,v). (Triangle inequality) 
The proof of the theorem is left as an exercise. 


The following theorem extends the result in Theorem 3.6 from R” to general inner 
product spaces. The proof of the theorem is exactly the same as that of Theorem 
3.6 and we therefore omit it. 


Theorem 5.5 (Generalized Theorem of Pythagoras) Let u and v be orthogonal 


vectors in an inner product space V. Then 
lu +v? = llul? + Iiv’. 
5.2.3 Complement 


We extend the orthogonality of two vectors to that of sets of vectors in inner product 
spaces. 


Definition Let W be a subspace of an inner product space V. 


(i) A vector u in V is said to be orthogonal to W if it is orthogonal to every 
vector in W. 


(ii) The set of all vectors in V that are orthogonal to W is called the orthogonal 
complement of W, and denoted by Wt. 


The following theorem shows three basic properties of orthogonal complements. 


Theorem 5.6 If W is a subspace of a finite-dimensional inner product space V, 
then 


(a) W+ is a subspace of V. 
(b) WNW+ = {0}. 
(ec) Wc (WH). 
Proof For (a), let u,v € W+ and k € R. Then for all w € W, we have 
(u,w)=0,  (v,w) = 0. 


Therefore, 
(u + kv, w) = (u, w) + k(v,w) = 0. 


Thus, u + kv € W+ and it follows from Theorem 4.2 that W+ is a subspace. 


122 Chapter 5 Inner Product Spaces 


For (b), for any vector u € W N WŁ, let u = u; = uz. Then 
(u, u) = (uy, u2) = 0, 


because 
u EWAW+Ł CW, wewnwtcwe. 
By Axiom (iv) [Positivity axiom], we have u = 0, i.e., W N W+ = {0}. 


For (c), for any vector u € W, u is orthogonal to WŁ. Besides, (W+)+ is the set of 
all vectors in V that are orthogonal to W+. Then u € (W+)+, i.e., W C (W+)+. 


5.3 Orthogonal Bases and Gram-Schmidt Process 


In many problems involving inner product spaces, we choose an appropriate basis 
for the vector space to simplify the solution of a problem. Frequently we consider a 
basis in which each pair of vectors is orthogonal. In this section, we reveal how to 
find such a basis. 


5.3.1 Orthogonal and orthonormal bases 


Definition A set of vectors in an inner product space V is called orthogonal if 
all pairs of distinct vectors in the set are orthogonal. An orthogonal set in which 


each vector has norm 1 is called orthonormal. 


If v is a nonzero vector in an inner product space, then by Theorem 5.3 (c), the 


vector 
1 


ivl 


has norm 1, since 
1 1 1 
mayl = a ivil = gyllyll = 1. 
Iivi Iivi Iivi 


The process of multiplying a nonzero vector v by 1/||v|| to obtain a unit vector is 
called normalizing v. An orthogonal set of nonzero vectors can always be converted 
to an orthonormal set by normalizing each of its vectors. 


Remark A set {v1, V2,..., Vn} is orthonormal if and only if 
(vi, vj} =0, i Æj; 
(vi vi) = 1, 1l<icn. 


Two nonzero orthogonal vectors are linearly independent. The following theorem 
generalizes the property to an orthogonal set of nonzero vectors. 
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Theorem 5.7 Let S = {v1, V2,..., Vn} be an orthogonal set of nonzero vectors in 
an inner product space V. Then S is linearly independent. 
Proof Consider the following equation 

kıvı + k2Vv2 +++ knVn = 0. (5.3) 


We want to show that 


Ry hs eS EN 


Beginning with kı, we have by taking the inner product on both sides of (5.3) with 
Vi, 
(kivi + kova + +++ + knVn,V1) = (0, v1). 


Since S is orthogonal, we obtain 
ki(vi,vi) +ko-O+--- +k, -0=0, 
iê., 
kı(vı, v1) =0. 


Since vı 4 0, it follows that (v1, vı} Æ 0 by Axiom (iv) [Positivity axiom]. Then 
kı = 0. Similarly, 


Thus, S = {v1,v2,...,Vn} is linearly independent. 


Definition In an inner product space, a basis consisting of orthogonal vectors is 
called an orthogonal basis, and a basis consisting of orthonormal vectors is called 
an orthonormal basis. 


Orthonormal bases for inner product spaces are always convenient to solve 
problems because they simplify the expression of a vector and some related formulas 
as the following two theorems show. 


Theorem 5.8 Let S = {v1,V2,...,Vn} be an orthonormal basis for an inner prod- 
uct space V. Then for any u in V, 


u= (u, Vi)v1 + (u, Vo)Ve2 TRAST (u, Vn) Vn- 
Proof Since u € V and S is a basis for V, we have 
u = kıvı + k2V2 +++ knVn- (5.4) 


Because S is orthonormal, taking the inner product of u and vj, it follows from 
(5.4) that 


(u, vi) = (kivi + kove +--+ + knVn, V1) 
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= kı (v1, v1) + ko(v2,v1) +--+ + kn (vn, V1) 
= kı x 1+k:x 0+- + kn x0 = ky. 


Similarly, 


(u, vi) = kj, 2< 5 <n. 


Theorem 5.9 Let S be an orthonormal basis for an n-dimensional inner product 
space V. If the coordinate vectors of u and v relative to the basis S are given by 


[u]s = [u1, U2,..-, Un] and [v]g = [v1,V2,---, Un], 


then 


(a) ull = Jug tug te + ue. 


(b) d(u,v) = (ui — v1)? + (u2 — v2)? +--+ + (Un — Vn)? 


(c) (u, v) = uv + uv +--+ + Unun. 


Proof We only prove (c) and leave the proofs of the remaining parts as an exercise. 
Let S = {w1, W2,..., Wn}. Then 


n 
u = y UiWi, v= y UjWj. 
i=1 


Since S$ is orthonormal, we have 


n n n N 
(u, v) = ( > UiWi, > uwy) = > > UiVj (Wi, Wj) = U1 FUV H: H Unn. 
i=1 j=1 


i=1 j=1 


Remark Let S = {vj,vo,...,V,} be an orthogonal basis for a vector space V. 
Then normalizing each of these vectors yields the orthonormal basis 


a Rear Reap 


For any vector u € V, it follows from Theorem 5.8 that 


u= (u, vı ) vı + (u v2 ) V2 torfu Vn ) Vex 
IIvall/ liva] Ilvall / |Ivall [Vall 7 Ilva] 


which can be rewritten as 
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5.3.2 Projection theorem 


Theorem 5.10 (Projection Theorem) Let W be a finite-dimensional subspace of 
an inner product space V. Then every vector u in V can be expressed in exactly one 
way as 


u = Ww + Wo, 


where w, is in W and wo is in WZ. 


Proof First, we prove the existence of wı and w2. Let dim(W) = n and 
{V1, V2;...,Vn} be an orthonormal basis for W. For any u € V, we construct 
the following two vectors 


Then u = w; + wz. Obviously, w} € W. We want to show that wọ € W+, i.e., 


(w2,w) = 0 for all w € W. Let w € W and then w = 5 kjvj. Thus, 


j=1 
(w2,w) = (u — w1, w) = (u, w) — (wi, w). (5.5) 
Since Pe R 
(u, w) = (u, Ð kvi) = 0 klu, v) 
j=1 j=1 
and 
(wiw) = (X vva D kvi) = D ta, vika (vi v) 
i=1 j=1 i=1 j=1 
Z o J1, i=j 
oa 2 ilav), (vi, vj) K i 0, iżj ; 


substituting them into (5.5), we have (w2, w) = 0 for any w € W. Hence wo € WŁ. 


Second, we prove the uniqueness of the expression. Let u = w} +w5 with w € W 
and wi € W+. Also u = wı + we with w; € W and w2 € Wt. We then have 


u—u= 0 = (w) — w1) + (w3 — w2). 


Hence 
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Let q = w1—w‘, = w,—wo. By Theorem 5.6 (b), we know that q € WNW+ = {0}. 
Then 


wi — Ww, = 0 = w5 — we. 


Thus, 


/ / 
Wi = WwW), W2 = Wo. 


Corollary If W is a subspace of an inner product space V with dim(V) = m, then 
dim(W) + dim(W+) = m. (5.6) 


Proof By using Theorem 4.14, Theorem 5.10, and Theorem 5.6 (b), we have 


dim(W) + dim(W*) = dim(W + W+) + dim(W N W+) = dim(V) +0 = m. 


Remark Let V be an inner product space with dim(V) = m. Since W+ is a 
subspace of V, we have by the corollary above, 


dim(W*) + dim((W*)*) = m. (5.7) 
By (5.6) and (5.7), we obtain 
dim(W) = dim((W+)+). 
It follows from Theorem 5.6 (c) and Theorem 4.13 that 
W =(W-)+. 


Because W and W+ are orthogonal complements of one another, we say that W 
and W+ are orthogonal complements. 


In Theorem 5.10, the vector w, is called the orthogonal projection of u on W 
and is denoted by projwu. The vector we is called the component of u orthogonal 
to W and is denoted by projy.u. Thus, 


u = projwu + projyiu or projy.u=uUu— projyu. 


The following theorem gives formulas to compute orthogonal projections onto a 
finite-dimensional subspace. 


Theorem 5.11 Let W be a finite-dimensional subspace of an inner product space 
V. 


(a) If {v1, V2,...,Vr} is an orthonormal basis for W and u is any vector in V, 
then 


projwu a (u, vi)v1 a (u, Vo) V2 qn (u, Vp) Vr- 
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(b) If {v1,v2,...,Vv,} is an orthogonal basis for W and u is any vector in V, then 
. (u, vy) (u, V2) (u, V) 

projwu = vic vot:---4 Vp. (5.8) 
PE IR [vəl]? Ive ll? 


Proof We only prove (a) and the proof of (b) is trivial by using (a). We have by 
Theorem 5.10, 


u = projwu + projw U. 


Since projy,u € W, we have 
projyyu = kıvı + k2V2 fore + kr Vr. 
Then 


r 
u = 5 kjvj + projw u. 
j=1 
Hence 


r 


Tr 
(u, vi) =( So kyvj+projwiu, v: )= 0 kj (vj, vi) +(projw1u,v;) = ki 1<i<r. 
j=1 j=l 


Thus, (a) holds. 


We next provide a geometric perspective to depict the relations between the 
nullspace and the row space of a matrix. 


Theorem 5.12 Let A be an m x n matriz. Then 


(a) The nullspace of A and the row space of A are orthogonal complements in R” 
with respect to the Euclidean inner product. 


(b) The nullspace of AT and the column space of A are orthogonal complements 
in R™ with respect to the Euclidean inner product. 


Proof For (a), let r1,r2,...,rm be the row vectors of A. First, we want to show 
that the orthogonal complement of span{r1,r2,...,!%m} is the nullspace of A. To do 
this we must show that if a vector v is orthogonal to span{r1,Yro,...,%m}, then v is 
in the nullspace of A. Conversely, if a vector u is in the nullspace of A, then u is 
orthogonal to span{r1, r2,;..., rm}. 
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Assume first that v is orthogonal to span{r1,r2,...,r m}. Then v is orthogonal 
to all row vectors r1,r2,...,rm, i.e., riv = 0 for 1 <i <m. We have 
rı rıv 0 
r2 rov 0 
Aye jti |v=] Sa |=] 77 | =0. (5.9) 
rm mV 0 


Thus, v is a solution of Ax = 0, i.e., v is in the nullspace of A. 


Conversely, assume that u is in the nullspace of A. Then Au = 0. It follows 
from (5.9) that r;u = (u,r;) = 0 for 1 <i < m. Let w € span{r1,r2,..., rm} and 


then w = J` cr;. Taking the inner product of u and w yields 
i=1 


m 


(u,w) = (aX er) Sy nies. 


i=1 
Then u is orthogonal to the row space of A. 


Thus, the orthogonal complement of the row space of A is the nullspace of A. 
Since W = (W+)+ for a subspace W in a finite-dimensional space, the orthogonal 
complement of the nullspace of A is the row space of A. 


For (b), by applying the results in (a) to A’, it follows that the nullspace of A? and 
the row space of AT are orthogonal complements in R™. Thus, the nullspace of AT 


and the column space of A are orthogonal complements in R”. 


5.3.3 Gram-Schmidt process 


In order to produce orthogonal (or orthonormal) bases, we introduce the Gram- 
Schmidt process. Let 


S = {u1, U2, ... , Un} 
be a linearly independent set. The following process converts S' to be an orthogonal 
set. 


Step 1. Let vı = uj. 


Step 2. As illustrated in Figure 5.1, we can obtain a vector v2 that is orthogonal 
to vı by computing the component of ug orthogonal to the space W1 = 
span{vı}. By using (5.8), we have 
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V2 = U2 — proj, U2 = U2 — 


Of course, if və = 0, then v2 is not a basis vector. However, this cannot 
happen, since it would then follow from the preceding formula for v2 that 
(U2, V1) (U2, V1) 
Imi? vill? 


ug = 1 


which implies that u> is a multiple of u, contradicting the linear 
independence of S. 


Step 3. To construct a vector v3 that is orthogonal to both vı and v2, we compute 
the component of u3 orthogonal to the space W2 = span{v1, V2}. See 
Figure 5.2. From (5.8) again, 


V3 = U3 — projy, U3 = Us vı 


As in Step 2, the linear independence of S ensures that v3 Æ 0. 


Step 4. To determine a vector v4 that is orthogonal to v1, v2, and v3, we compute 
the component of u4 orthogonal to the space W3 = span{v1, V2, v3}. It 
follows from (5.8) that 


sa = (u4, V1) (u4, V2) (u4, v3) 
YAT me pwa Deal? Tall? “2 Tall? *° 
and v4 Æ 0. 
n-1 (u vi) 
Step n. Vn = uy — 5 Tve % and vn Æ 0. 
: Vi 
i=l 


The preceding step-by-step construction for converting an arbitrary linearly indepen- 
dent set into an orthogonal set is called the Gram-Schmidt process. 


V3 = U3 — ProJ w,U3 


V2= Uy — Pro) w; Ug 


Wi 
> > > 


vi PLO) w, U2 


Figure 5.1 Figure 5.2 
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Remark We have 
span{vi, V2,...,vj;} =span{uj,ug,...,uj;}, L<j<n. 


Moreover, Vk+1 is orthogonal to span{uy, U2,...,ux} for any k. Thus, every nonzero 
finite-dimensional inner product space has an orthogonal (or orthonormal) basis. 


Example Consider R? with the Euclidean inner product. Apply the Gram-Schmidt 
process to transform the basis vectors 


u, = [1, —1,0], u = |-1,1, 1], ug = [1,1,1] 


into an orthogonal basis {v1, v2, v3}, and then normalize the orthogonal basis vectors 
to obtain an orthonormal basis {q1, q2, q3}. 


Solution By using the Gram-Schmidt process, we have 


Step 1. vı = u; = [1,-—1, 0]. 


Step 2. v2 = u2 — projy, uz = u2 aad = [-1,1,1]+ [1, -1,0] = [0,0,1]. 
1 

x (u3, v1) (ug, V2) 

Step 3. v3 = Ug — projy,,u3 = u v v 
p 3 3 — ProJw, U3 3 FAE 1 ivl? 2 
0 
= (1, 1,1] = 5 lt, —1,0] = (0,0, 1] = [1,1,0]. 
Thus, 
vı = [1, —1,0], v2 = [0,0,1], v3 = [1,1,0] 


form an orthogonal basis for R3. The norms of these vectors are 
Imi = v2 [Ivall=1, lvs] = v2. 
Therefore, an orthonormal basis for R3 is S = {q1, q2, q3}, where 


1 1 1 1 
= SS 0 ’ = 0,0,1 ’ = = —=,0 z 
qı I= V2 | œ = [ ] q3 | 7 Va 


5.3.4 QR-decomposition 


Let A be an m x n (m > n) matrix with linearly independent column vectors. If 
Q is the matrix with orthonormal column vectors that results from applying the 


Gram-Schmidt process to the column vectors of A, what is the relationship between 
A and Q? 
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To solve this problem, let 


A=[m} mito], Qafa aa], 
where u; and qj (1 < j < n) are column vectors of A and Q, respectively. It follows 
from Theorem 5.8 that 
wy = (u, q1)qı + (u1, q2)q2 +--+ + (U1, qn)qn 
u2 = (u2, q1)qı + (U2, q2)q2 + +++ + (U2, qn)qn 
Un = (un, q1)qı + (Un, G2)q2 + ++» + (Un, Gn)Gn- 


It can be written in matrix form 


|w eres cern un | 
(ui,qi) (U2,q1) ©- (Un, qi) 
i i (ui,q2) (u2,q2) > (Un, q2) 
= | qı ; Q2 | | qn ] ; x : ’ 
(ui, dn) (u2, dn) aes (Un, dn) 
or more briefly as 
A=QR, (5.10) 
where 
(u1, qı) (u2, qı) (un, qı) 
(uy, q2) (u2, q2) (Un, q2) 
R= ? 
(uy, Gn) (u2, Gn) eee (Un, Gn) 
It is a property of the Gram-Schmidt process that for 7 > 2, the vector q; is 
orthogonal to u;,Ug,...,Uj;-1. Therefore, all entries below the main diagonal of 
R are zero, 
(Wi,41) (U2,q1) +++ (Un, qı) 
0 (u2,q2) +++ (Un, q2) 
R= : . . 
0 0 7 (Un, Gn) 


All the diagonal entries (u;,q;) (1 < j < n) of R are nonzero. We prove this 


fact as follows: 
For (u1, q1), since qı = Tall # 0, we know that (ui,qi) 4 0. 
1 
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For (uz, q2), if (u2, q2} = 0, then 


(u2, U1) 


uz = (U2, q1)qı = EAE 


ui, 


i.e., u; and ug are linearly dependent, which contradicts the fact that u and u2 are 
linearly independent. 


Moreover, for (u;,q,;), if (uj, qj) = 0, then 


uj € span{qu, q2;---, qj-1}.- 


But 
U1, Ug,-..,Uj—1 € span{q1, q2,---,4j—1}, 
which implies 
u1, Ug,...,Uj_1, Uj E€ span{qi, q2,...,q;-1}. 
Thus, u1, Ug,...,Uj—1,U; are linearly dependent by Theorem 4.9 (a). A contradiction 


again! Therefore, (u;,q;) # 0 for 1 < j <n. Then R is invertible. 


Formula (5.10) is a decomposition of A in the form of the product of a matrix 
Q with orthonormal column vectors and an invertible upper triangular matrix R. 
We call (5.10) the QR-decomposition of A. In summary, we have the following 
theorem. 


Theorem 5.13 (QR-Decomposition) Let A be an m x n matrix with linearly 
independent column vectors. Then A can be decomposed as 

A=QR, 
where Q is an m x n matrix with orthonormal column vectors and R is ann x n 


invertible upper triangular matriz. 


Remark Recall from Theorem 4.28 that if A isan nxn matrix, then the invertibility 
of A is equivalent to linear independence of the column vectors. Thus, every 
invertible matrix has a QR-decomposition. 


We conclude this section by adding three more results to the following theorem, 
which involves all major topics we have studied so far. 


Theorem 5.14 Let A be ann x n matrix and T4: R” —> R” be multiplication by 
A. Then the following are equivalent. 


(a) A is invertible. 


(b) Ax =0 has only the trivial solution. 
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(c) The reduced row-echelon form of A is In. 

(d) A is expressible as a product of elementary matrices. 

(e) Ax = b is consistent for every n x 1 matriz b. 

(£) Ax =b has exactly one solution for every n x 1 matriz b. 
(g) det(A) 4 0. 

(h) The range of T4 is R”. 

(i) Ta is one-to-one. 

(j) The column vectors of A are linearly independent. 

(k) The row vectors of A are linearly independent. 

(1) The column vectors of A span R”. 
(m) The row vectors of A span R”. 

(n) The column vectors of A form a basis for R”. 

(o) The row vectors of A form a basis for R”. 

(p) A has rank n. 

(q) A has nullity 0. 

(r) The orthogonal complement of the nullspace of A is R”. 
(s) The orthogonal complement of the row space of A is {0}. 


(t) A has a QR-decomposition. 


5.4 Best Approximation and Least Squares 


We show how orthogonal projections can be used to solve certain approximation 
problems. The results obtained here have a wide variety of applications in both 
mathematics and science. 
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5.4.1 Orthogonal projections viewed as approximations 


If a point P € R? and W is a plane through the origin O, then the point Q in W 
closest to P is obtained by dropping a perpendicular from P to W. See Figure 5.3. 
Therefore, let u = OP and then the distance between P and W is given by 


min, |u — w|| = |u — projw ull- 


See Figure 5.4. Thus, projyu is the “best approximation” to u by vectors in W. 


P 


u—proj yu 


proj wu 


Figure 5.3 Q is the point in W closest to P 


proj wu 


Figure 5.4 —||u — w|| is minimized by w = projwu 


Theorem 5.15 (Best Approximation Theorem) Let W be a finite-dimensional 
subspace of an inner product space V and u be in V. Then projwu is the best 
approximation to u from W in the sense that 


[u — projyul| < |u — w/| 
for every vector w in W with w Æ projwu. 


Proof For every vector w in W, we can write by Theorem 5.10 (Projection 


Theorem), 
u — w = (u — projwu) +(projwu — w). 
i—i aM 
w+ w 
Thus, by Theorem 5.5, 
lu- w|? = ||u — projyul|? + ||projwu — wll’. 


If w Æ projyyu, then the second term in this sum is positive, so that 


5.4 
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lu- wl? > |u — projyyull’, 


which implies that 


lu — wl| > |u — projyull. 


5.4.2 Least squares solutions of linear systems 


Up to now, we have been mainly concerned with consistent systems of linear 


equations. However, inconsistent linear systems also appear in science and engineer- 
ing. If Ax = b has no solution, then for any x, || Ax — b|| 4 0 with the Euclidean 
norm. We therefore study the following least squares problem. 


(1) 


Least squares problem. Given a linear system Ax = b of m equations in n 
unknowns, find a vector x € R” that minimizes || Ax — b|| with respect to the 
Euclidean norm on R”. Such x is called a least squares solution of Ax = b. 


Let A = | c1 | c2 | -+> | Cn J, where c; (1 < i < n) are the column vectors of 
A. Then 

W = span{c1,€2,...,Cn} = {r = Ax | x € R”}, (5.11) 
i.e., AXE W. 


Review of Theorem 5.12 (b). If A is an m x n matrix, then the nullspace of 
AT and the column space W of A are orthogonal complements in R™ with 
respect to the Euclidean inner product, i.e., 


W+ = nullspace of AT. 


Find a solution of the least squares problem. It follows from Theorem 5.15 
(Best Approximation Theorem) that the closest vector in W to b is the 
orthogonal projection of b on W. Thus, for a vector x € R” to be a least 
squares solution of Ax = b, x must satisfy 


Ax = projw b, 


and then 
b — Ax = b — projw b. 


Since b — projy/b € W+, it follows from (2) that 
b — Ax € W+ = nullspace of A’. 
Therefore, a least squares solution of Ax = b must satisfy 


A™(b— Ax) =0, 
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i.e., 

AT Ax = ATb. 
This is called the normal system associated with Ax = b. Thus, the problem 
of finding a least squares solution of Ax = b has been reduced to the problem 


of finding an exact solution of the associated normal system. The following 
observations are about the normal system: 


(a) The normal system has n equations in n unknowns. 


(b) The normal system is consistent, since it is satisfied by a least squares 
solution of Ax = b. 


(c) The normal system may have infinitely many solutions and all of its 
solutions are least squares solutions of Ax = b. 


5.4.3 Uniqueness of least squares solutions 


We establish conditions under which a linear system is guaranteed to have a unique 
least squares solution. We need the following theorem. 


Theorem 5.16 Let A be an m x n matrix. Then the following are equivalent. 


(a) A has linearly independent column vectors. 
(b) ATA is invertible. 


Proof (a) = (b): Assume that A has linearly independent column vectors. The 
size of matrix ATA is n xn, so we can prove that this matrix is invertible by showing 
that the linear system A? Ax = 0 has only the trivial solution. Assuming x is any 
solution of this system, then 


(Ax, Ax) = (Ax)? (Ax) = x7 AT Ax = x70 = 0, 


which implies Ax = 0 by Axiom (iv) [Positivity axiom]. Assuming A has linearly 
independent column vectors, so x = 0 by Theorem 4.27. 


(b) > (a): Assume that ATA is invertible. To prove that A has linearly independent 
column vectors, it suffices to prove that Ax = 0 has only the trivial solution by 
Theorem 4.27. But if x is any solution of Ax = 0, then AT Ax = A70 = 0,sox =0 
from the invertibility of AT A. 


The following theorem is a direct consequence of Theorem 5.16 and one can prove 
it easily. 


5.4 Best Approximation and Least Squares 137 


Theorem 5.17 (Uniqueness of Least Squares Solutions) Let A be an mxn ma- 
trix with linearly independent column vectors. Then for every b in R”, the linear 


system Ax = b has a unique least squares solution given by 


x = (ATA) ATb. 


Example Find the least squares solution of the linear system Ax = b given by 


T—- t= 2 
Ly = -8 
zı + t= 12 
zı +2r2 = 2 


and find the orthogonal projection of b on the column space of A. 


Solution Note that 


ee 2 
A= : and b= 8 
1 1 12 
1 2 


Since A has linearly independent column vectors, we know in advance that there is 
a unique least squares solution. We have 


1 -1 
ifs 2 e] 
-1 01 2|l1ı1 1 2 6 
1 2 
and 
2 
aTh Ta -8 ak 
at 0 1 2 12 14 
2 


So the normal system AT Ax = ATb is 


tiea 


Solving this system yields the least squares solution 
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Thus, the orthogonal projection of b on the column space of A is 


5.5 Orthogonal Matrices and Change of Basis 


A basis that is suitable for one problem may not be suitable for another. We will 
study a process that changes one basis to another basis in a vector space and also 
discuss various problems related to the changes of basis. 


5.5.1 Orthogonal matrices 


We first introduce the following important matrices and then study their fundamental 
properties. 


Definition A square matrix A is said to be an orthogonal matriz if 
ASAP 
or equivalently, ATA = AAT = I. 
Theorem 5.18 The following are equivalent for ann x n matrix A. 
(a) A is orthogonal. 


(b) The row (or column) vectors of A form an orthonormal set in R” with respect 
to the Euclidean inner product. 


Proof We only consider the case of row vectors and the proof of the case of column 


vectors is similar. Let r1,r2,...,rn be the row vectors of A. We have by using 
AAT =T, 
rı 
Se rır rirg rır 
2 
aale ie eee E 
rari rar rnr? 
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1 0 0 
0 1 0 
0 0 1 


if and only if 


Thus, the result holds. 


Remark In fact, if an n x n matrix A is orthogonal, then the set of row vectors of 
A forms an orthonormal basis for IR” and the set of column vectors of A forms an 
orthonormal basis for IR” as well. 


The following two theorems are concerned with some properties of orthogonal 
matrices. 


Theorem 5.19 We have 
(a) The inverse of an orthogonal matrix is orthogonal. 


(b) A product of orthogonal matrices is orthogonal. 


(c) If A is orthogonal, then det(A) = +1. 

Proof For (a), let A be an orthogonal matrix. Then 

AAD P Saar = (ATA) =I =] 
and 

(A-2)P A“! = (AT)-1471 = (AAT)! = It = I. 
Therefore, AT! is also orthogonal. 
For (b), let A and B be orthogonal matrices. Then 

(AB)? AB = BTATAB = BTB =I 


and 
AB(AB)? = ABB? AT = AAT =I. 


Thus, AB is also orthogonal. 
For (c), we have 

1 = det(I) = det(AT A) = det(A”)det(A) = [det(A)]?. 
Thus, det(A) = +1. 
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Theorem 5.20 The following are equivalent for a square matriz A. 


(a) A is orthogonal. 

(b) Axl] = [fx] for all x. 

(c) Ax. Ay =x-y for all x andy. 
Proof We only prove (a) = (c). The proof of (a) = (b) is left as an exercise. 
(a) = (c): We have for any x and y, 

Ax. Ay = (Ax)? Ay = xT AT Ay = xy = x- y. 
(c) = (a): Since for any x and y, 
x? AT Ay = Ax- Ay = x -y = x" Iy, 


we obtain 
xT(ATA-—TI)y =0. (5.12) 


Because (5.12) holds for all vectors x and y, it follows that 


ATA-I=0, ie, ATA=T. 


Therefore, A is orthogonal. 


5.5.2 Change of basis 


If we change the basis for a vector space V from an old basis B to a new basis B’, 
how is the old coordinate vector [v]g of a vector v € V related to the new coordinate 
vector [v]s-? More precisely, let B = {u1, U2,..., Un} be a basis for V. For any 
vector v € V, we have 


n 
v= y kjuj. 
j=l 


Denote the coordinate vector [v]g of v by 


If we change the basis for V from the old basis B to a new basis 


B' = {u1, Uy, at n Un}, 
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then the old coordinate vector [v]g of v is related to the new coordinate vector |v] p 
of the same vector v by the equation 


[vls = Plv|p., (5.13) 
where the columns of the matrix P are the coordinate vectors of the new basis 
vectors relative to the old basis, i.e., 
P= | [ui]s i (ule; -- | (ude |. 
The matrix P usually is called the transition matriz from B to B’. 
Now we want to show that (5.13) is true. Actually, 


1 


u; = [u1, u2,... , Un]lu;]B 
for 1 <i <n. Thus, 
juj, ub, ..., 04] = [u Ue,..., Un] P. 
For any v € V, we have 
1 ki 
n k k 
v= ð kjuj = [uuz 0] 2 | = uu nP. (5.14) 
= : : 
ky k! 
and 
kı 
n kə 
v= Ņ_ kjuj = [u1, u2, .-. , Un] . . (5.15) 
j=l 
kn 


Combining (5.14) and (5.15) yields 


ki kı 
k ko 
[u1 u2, Un] P o o =0. 
kl, kn 
Since {uj,U2,...,U,} is linearly independent, we have 
ki kı 
k) k2 
.|-] . | =9, ie, Piy]ø -[v]s=0 


ki, kn 
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Hence 


[v]s = Plv]e-. 
Example Consider the bases B = {u;, u2} and B’ = {u},u5} for R?, where 
u = [1,0], ie [0,1]; a SS u% = [3,1]. 
(a) Find the transition matrix from B to B’. 
| 
9 |" 


Solution For (a), we must find the coordinate vectors of the new basis vectors u} 


(b) Use (5.13) to find [vl if [vl ey = 


and us relative to the old basis B. We have by inspection 


u| = u + 2u, 
u, = 3u; + up 
so that 
1 3 
w=; and ea=l3 |. 


Then the transition matrix P from B to B’ is given by 
1 3 
P= [late i taila] =| ; A, 


For (b), using (5.13) and the transition matrix in (a), we obtain 


1 3 3 
seesi JE] 


In the above example, the transition matrix Q from B’ to B is 


W a es 
o-i] | 4) 


If we compute the product of the two transition matrices above, we find that PQ = I, 


—3 
4 


which implies that Q = P71. The following theorem shows that this holds for every 


case. 


Theorem 5.21 Let P be the transition matriz from a basis B to a basis B’. Then 


P is invertible and P~' is the transition matrix from B' to B. 
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Proof Let Q be the transition matrix from B’ to B. Then 
[x]e = Plx|p, [x] a = Q[x]p. 


Thus, for any x, we obtain 
[x]z = PQ[x]z. 


Therefore, 


PQ =I, ie, Q= P. 


Theorem 5.21 illustrates that a transition matrix is always invertible. The following 
theorem shows that the transition matrix from one orthonormal basis to another 
orthonormal basis is orthogonal. 


Theorem 5.22 Let P be the transition matrix from one orthonormal basis to 
another orthonormal basis for an inner product space V. Then P is an orthogonal 


matriz. 


Proof Let B = {u;, u2,..., Un} and B’ = {v1, V2,..., Vn} be the two orthonormal 
bases for V, and P = [p,;] be the n x n transition matrix from B to B’. Then 


Pil P12 ‘** Pin 
P21 p22 ++: Pan 
[Vi,V2,---; Vn] =[U1,Us,..-,Un] ] ; . 
Pni Pn2 *** Pmn 
Because {v1, V2,..., Vn} is an orthonormal basis, we have 
l, i=j; 
(vi, vj) = (5.16) 
0, iŻj. 
On the other hand, by using the orthonormal property of {u1, u2,..., Un}, we have 


for any i and J, 


(Vi, vj) = (5 UsPsi, >, urprj ) =X X paip (Us, ur) = > psipsj (5-17) 
s=1 r=1 s=l1 


s=lr=1 


Combining (5.16) and (5.17), we deduce 
l, i=j; 


0, i439. 


PiiPij + P2ipaj +++ + PriPnj = 


Thus, PTP = I, i.e., P is an orthogonal matrix. 
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Elementary exercises 

5.1 Let u = [u1, u2], v = [v1, v2] € R?. Show that 
(u,v) = 4u1 v1 + 3u2v2 

defines an inner product. 


5.2 Let p= a9 + a£ + azz? and q = bo + b,x + box? be any two polynomials in 
Py. 


(a) Show that (p,q) = aobo + a,b; + Gabe defines an inner product on Py. 
(b) Use the inner product in (a) to find d(p,q) if 
p=-34+2+2’, q= 1+ 2x — 42”. 
5.3 Let (u,v) be the Euclidean inner product on R”. Show that for any A € R”*”, 
(u, Av) = (AT u,v). 
5.4 Let u and v be vectors in a real inner product space. Show that 
(a) Theorem 5.2 holds, i.e., |(u,v)| < ||ul] - |v]. 


(b) The equality holds in the Cauchy-Schwarz inequality if and only if u and v are 
linearly dependent. 


5.5 Prove Theorem 5.4. 


5.6 Consider R3 with the Euclidean inner product. For which values of k, are u 
and v orthogonal? 


@ v= Weeks aek wai: 
5.7 Let S = span{v1, v2, v3}. Check whether the vector u € S+. 
(a) vi =[1,0,0,0], v2 = [0,3,0,0], va = [5,2,1,0], and u = (0,0, 0,11. 
(b) vı =[1,—2,3,1], vz = [2,0,3,5], va = [0, 1,2,5], and u = [0, 1,3,0]. 
(c) v1 =([3,4,1,7], vo = [1,0,3,1], v3 = [-1,2, 1,1], and as i01]: 


5.8 Show that if u is orthogonal to each of the vectors v1, v2,...,Vn, then it is 
also orthogonal to span{v1, V2,..., Vn}. 
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5.9 Prove Theorem 5.9 (a) and (b). 


5.10 Let {uj,ue,...,u,} be an orthogonal set in an inner product space V. Show 
that 


lui + ug + +++ + unl? = [lu ? + laal? +--+ lan’. 


5.11 Let S = {v1, V2,..., Vn} be an orthonormal basis for an inner product space 
V. Show that 


2 2 2 2 
aivi + Gove + +++ + GnVn| =a] + agte + ay, 
where a1,a2,..., an E€ R. 


5.12 Find the orthogonal projection of v onto the subspace W spanned by vectors 
u; and ug, where v = [1,2,3], ui = [2, —2, 1], and ug = |—1, 1,4]. 


5.13 Let u = [1,1,1,1] and S = span{v1, vo, V3, va}, where 
vı = [0, —3, 2, 2], V2 = [1,—1,0,1], V3 = [3,0, —2, 1], v4 = [1,2, —2, —1]. 
(a) Find a subset of {v1, V2, V3, v4} that forms a basis for the space S. 


(b) Express each vector which is not in the basis as a linear combination of the 
basis vectors. 


(c) Find the orthogonal projection of u onto S and the component of u orthogonal 
to S. 


5.14 Consider R with the Euclidean inner product. Apply the Gram-Schmidt 
process to transform the basis vectors 


u; = [1,0,0], u2 = [0, 4, 1], u3 = [3, 7, —2] 
into an orthonormal basis. 


5.15 Consider R* with the Euclidean inner product. Apply the Gram-Schmidt 
process to transform the basis vectors 


u, = [1,2,0, —1], ug = [1,0,0,1], u3 = [0,2,1,0], u; = [1, —1,0,0] 
into an orthonormal basis. 
5.16 Find an orthogonal basis for R3 that contains a vector vı = [0, 2,3]. 


5.17 Find an orthonormal basis for R* that contains vectors 
poa 
’ 9? $ 


vı = [1,0,0,0], V3 = 5 D 
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5.18 Find the QR-decomposition of the matrix. 


2 8 2 1 2 DA 
eats 3 b E 

(a) plea a sa ee (c) E a ta 

1 -1 0 


5.19 Let u; = [1,2,—1], ue = [5, —2, 1], and v = [3,2,5]. 
(a) Find the best approximation to v from W = span{u;, u2}. 
(b) Find min ||v — w|]. 
wew 


5.20 Find the least squares solution of each linear system Ax = b, and find the 
orthogonal projection of b on the column space of A. 


ae 1 Sct R 

(a) A=} -1 1], b=] -1 (b) A= ; , b= ; 
1 -2 1 

—1 2 3 


5.21 Determine which of the following matrices are orthogonal. 


1 2 2 5 3 3 3 3 
—1 3 
1 1 3 —5 1 1 
(a) = 2 —2 1 (b) = (c) 
3 2 lini 4 BY is a ig 
—2 -1 2 3 1 —-5 1 


5.22 Consider the bases B = {uj, u2} and C = {v1, v2} for R?, where 
ui = [1,0], u2 = [1,1]; YLS [f=]; v2= (0, 1]. 


(a) Find the transition matrix P from B to C, and the transition matrix Q from 
C to B. 


(b) Find the coordinate vector [x]c if x = [8, —4]. 
(c) Find [x]g according to the coordinate vector [x]c in (b). 
5.23 Consider the bases B = {uj, u2, ug} and C = {v1, v2, v3} for R3, where 
u, = [1,0,—1], u= [2,1,1], us = [1,1,1]; 
vı = [1,1,0], v= [0,1,1], v3 = [1,2,1]. 


(a) Find the transition matrix P from B to C, and the transition matrix Q from 
C to B. 


(b) Find [w]s if [w]e = [1, —3, —5]. 
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Challenge exercises 


5.24 Show that 
lu + vl? + llu — vi]? = Jul]? + 2llv]? 


holds for any vectors u and v in a real inner product space. 


5.25 Let f(x) and g(x) be two functions in C[0, 1]. Show that 


(| l Kosod) <(/ Poa) ( Ploae) 


([ereaaris) «(rem (fam 


5.26 Find the nullspace S of the augmented matrix of the following system and 
then find S+. 


and 


vy 2x9 i 323 — 4x4 =0 


M+ 5x2 "Tr 323 + 304 = 0. 


5.27 Let {u1, U2,..., Up} be an orthonormal basis for a subspace W of R”. Show 
that for any y E€ R”, 
projwy =UU'y, 


where U is an n x p matrix given by U = | u | uz |e | up 


5.28 Let S = {v1, V2;,..., Vk} be an orthonormal set in R” and x € R”. Show 
that 
Ix]? > (x, v1)? + (x, Va)? + + (X, Va) 


5.29 Let A be a symmetric matrix. Show that Ax = b has a solution if and only 
if b is orthogonal to the nullspace of A. 


5.30 Let W be the plane x — y + 2z = 0 in RÌ, and v = [3, -1, 2]. 
(a) Find the orthogonal projection of v onto W. 
(b) Find the component of v orthogonal to W. 


5.31 Let Ax = b be consistent. Show that if A has linearly independent column 
vectors, then the least squares solution of Ax = b is the same as the exact solution 
of Ax = b. 


5.32 Let A be a matrix with linearly independent column vectors and b be a vector 
orthogonal to the column space of A. Show that the least squares solution of Ax = b 
is x = 0. 
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5.33 Show that A is orthogonal if and only if || Ax|| = ||x|| for all x. 


5.34 Let B and C be bases for R? and C = {uj, u2}, where u; = [1,2], u2 = [2,3]. 
Find the basis B if the transition matrix from B to C is 


1 -1 
—1 2 | 


5.35 Let x € R” with ||x|| = 1. Suppose that x has a partition as 


P= 


Tı 
T2 v1 
x = = —— ; 
y: 
Tn 


where y € R"~}, and 
Tı 


Q= 1 ; 
y tak yy” 
a 


where I is the (n — 1) x (n — 1) identity matrix. Show that Q is orthogonal. 


Chapter 6 


Eigenvalues and Eigenvectors 


“ Eigenvalues are in everything. There is an eigenvalue in the burrito you are going to eat 
for lunch today.” 


— A linear algebra professor 


“Nature hides her secrets because of her essential loftiness, but not by means of ruse.” 


— Albert Einstein 


In this chapter, we study eigenvalues and eigenvectors of a square matrix A. An 
eigenvector x Æ 0 of A is a special vector which does not change its direction when 
it is multiplied by A, i.e., Ax = Ax for some value A. Such a value is then called an 
eigenvalue of A. An essential highlight of this chapter is the diagonalization problem. 


6.1 Eigenvalues and Eigenvectors 


We introduce the concepts of eigenvalue, eigenvector, and eigenspace and then 
present methods to compute them through examples. 


6.1.1 Introduction to eigenvalues and eigenvectors 


Definition Let A be a square matrix. If a nonzero vector x satisfies Ax=Ax, where 
Aà is a scalar, then X is called an eigenvalue of A and x is called an eigenvector 
of A corresponding to À. 


We observe that if A is an n x n matrix and A is a scalar, then the following are 
equivalent. 


(1) There exists x Æ 0 such that Ax = Ax. 
(2) The system of equations (AJ — A)x = 0 has nontrivial solutions. 


(3) det(AT — A) = 0. 
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The equation det(AI — A) = 0 is called the characteristic equation of A. Any 
scalar satisfying the equation det(AJ — A) = 0 is an eigenvalue of A. Obviously, if A 
is a triangular matrix, then the eigenvalues of A are the entries on the main diagonal 
of A. In fact, for an n x n matrix A, det(AI — A) is a polynomial in À of degree n, 
which is called the characteristic polynomial of A. 


Example Find the eigenvalues of 


Solution The characteristic polynomial of A is 


N= 3u 228 A 
det(A —A)=det| =-2 A =2 | =d?-—6\7-—15\-8. 
= a A3 


The eigenvalues of A therefore satisfy 
A — 6A? -15A-8=0 = > (A-8)(A+1)? =0. 


Thus, the distinct eigenvalues of A are 


A=8, À=]. 


Remark A real matrix may have complex eigenvalues. For instance, let 


Ae a 
1 0 
The characteristic polynomial of A is 
A 1 
sews = 4) = de | : [oats 


The eigenvalues of A are the roots of A? + 1 = 0 and therefore À = i and \ = —i, 
where i? = —1. For a review of complex numbers, we refer to Subsection 8.3.1 for 
details. 


6.1.2 Two theorems concerned with eigenvalues 


We list two theorems concerned with some properties of eigenvalues and eigenvectors. 
The first theorem indicates a simple way to find the eigenvalues and eigenvectors of 
any positive integer powers of a matrix A once the eigenvalues and eigenvectors of 
A are found. The second one demonstrates a relationship between eigenvalues and 
the invertibility of a matrix. 
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Theorem 6.1 Let k be a positive integer, A be an eigenvalue of a matrix A, and x be 
a corresponding eigenvector. Then à} is an eigenvalue of A? and x is a corresponding 
eigenvector. 


Proof Since Ax = Ax, we have 


A’x = A(Ax) = A(Ax) = Ax = \(Ax) = x. 


By induction, one can prove easily that A*x = \*x for any positive integer k. 


Theorem 6.2 A matrix A is invertible if and only if A = 0 is not an eigenvalue of 


A. 


Proof In fact, the statement in the theorem is equivalent to that A is not invertible 
if and only if A = 0 is an eigenvalue of A. So we consider its equivalent statement. 
If \ = 0 is an eigenvalue of A, then 


Ax = 0x = 0 


has nonzero solution x, i.e., A is not invertible. Conversely, it is obviously true. 


6.1.3 Bases for eigenspaces 


We know that the eigenvectors corresponding to A are the nonzero vectors in the 
solution space of (AJ — A)x = 0. This solution space is called the eigenspace of A 
corresponding to A. The following example is of finding bases for the eigenspaces of 
a matrix A. 


Example Find bases for the eigenspaces of 


D 

II 
N O eme 
NO wre 
SS: @- O 


Solution The characteristic equation of A is 
A3 — 4)? — 8A + 18 = (A+ 2)(A — 3)? =0. 


Thus, the distinct eigenvalues of A are A = —2 and A = 3. There are two 
eigenspaces of A. By definition, we know that x = [11,2%2,23]" is an eigenvector of 
A corresponding to À if and only if x is a nontrivial solution of 


(AI — A)x = 0 A-3 0 z2 |=] 01]. (6.1) 
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If A = —2, then (6.1) becomes 


ke A ae 
[23 s}l2/ [8] 


Solving this system yields 


zı =—S, %2=0, w=. 
Thus, the eigenvectors corresponding to À = —2 are the nonzero vectors of the form 
=s —1 
x= 0 | =8s 0 
Hence [—1,0,1]” is a basis for the eigenspace corresponding to À = —2. 


If A = 3, then (6.1) becomes 


2 -l1 —3 TI 0 
0 0 0 z2 |=] 0 
—2 -2 3 T3 0 


Solving this system yields 
zı = 35s, %2=0, z3 = 2s. 


Thus, the eigenvectors of A corresponding to \ = 3 are the nonzero vectors of the 


form 
3s 3 
x= 0 |=s] 0 
2s 2 


Hence [3,0, 2” is a basis for the eigenspace corresponding to À = 3. 


6.2 Diagonalization 


We are concerned with the problem of finding a basis for R” that includes all 
eigenvectors of a matrix A € R”*” because such a basis helps us to simplify 
numerical computations involving A. In this section, our goal is to show that this 
problem is actually equivalent to a diagonalization problem. 


6.2.1 Diagonalization problem 


Given a square matrix A, does there exist an invertible matrix P such that P~!AP 
is a diagonal matrix? Such kind of problem is called the diagonalization problem. 
A square matrix A is called diagonalizable if there is an invertible matrix P such 
that P~!AP is a diagonal matrix. 
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Theorem 6.3 Let A be ann x n matriz. Then the following are equivalent. 
(a) A is diagonalizable. 
(b) A has n linearly independent eigenvectors. 


Proof (a) => (b): Since A is diagonalizable, there exists an invertible matrix P such 
that 
P-1AP =D, 


where D is a diagonal matrix with diagonal entries 1, A2,..., An, 1e., 
D= diag(A1, A2, e... An): 


Let 
P=| pi | p2 |e | Pa fo 


where p; (1 < i < n) are the column vectors of P. Then from P~!AP = D, we have 


AP = PD, 
which implies 
| Ap; | Ap2 | © | Apn = | AiPi | A2P2 | e | AnPn 
Therefore, for any 2, 
Ap; = Audi; 
i.e., P1, P2,---,Pn are eigenvectors of A corresponding to the eigenvalues 1, A2,.--, 


An; respectively. Since P is invertible, the column vectors p; (1 < i < n) are linearly 
independent. Thus, A has n linearly independent eigenvectors. Conversely, one can 


show that it is also true. 


6.2.2 Procedure for diagonalization 


Recall that an n x n matrix A with n linearly independent eigenvectors is 
diagonalizable. The following procedure provides a method for diagonalizing A. 


(1) Find the eigenvalues A1, A2,..-,An of A. 


(2) If there are n linearly independent eigenvectors of A, say, P1, P2,- -Pn 
corresponding to Aj, A2,...,An, then we can construct a matrix 


P= | pig po fo | Pn 


(3) The matrix P~'AP = diag(A1, A2,---, An). 
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Example 1 Find a matrix P that diagonalizes 


1 2 0 
A=] 0 3 0 
2 —4 2 


Solution The characteristic equation of A is 
det(AI — A) = (A= 1)(A — 2)(å — 3) = 0. 


Thus, the eigenvalues of A are A = 1, A = 2, and A = 3. The corresponding 
eigenvectors are given as follows: 


—1 0 —1 
A=1, pi = 0 3 A= 2, P2 = 0 ; A=3, p3 = —1 
2 1 2 
In fact, pı, p2, and p3 are linearly independent. We can construct an invertible 
matrix 
-1 0 -1 
P=] pı | P | P |= 0 0 -1 
2 1 2 
such that 


Example 2 Let 


1 0 
A=| -l 3 0 
0 Sh 1 


Is A diagonalizable? 
Solution The characteristic equation of A is 
det(AT — A) = (A—1)(A— 2)? = 0. 


Thus, the distinct eigenvalues of A are A = 1 and \ = 2. The corresponding 
eigenvectors are given as follows: 


0 -1 
A=1, pr=] 0]; A=2, p= | —1 
1 1 


Since A is a 3 x 3 matrix and there are only two linearly independent eigenvectors 


in total, A is not diagonalizable. 
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Remark Ifan n xn matrix A is diagonalizable, then it is much easier for us to 
compute the power of A. More precisely, if P~!'AP = D, where 


D= diag(A1, A2, niig An), 


we then have 
A= PDP H, 


which implies 


A! = (PDP—)(PDP7!).-..(PDP"1)=PDD.--..DP-1 = PDE P}, 
—$— — — 
k k 


where D* = diag(A*, AB,..., AE). 
6.2.3 Two theorems concerned with diagonalization 


From the examples in previous subsection, one may guess that basis vectors from 
various eigenspaces of A are linearly independent. The following theorem gives the 


proof. 


Theorem 6.4 Let vı, V2,...,Vẹķ be eigenvectors of A corresponding to distinct 
eigenvalues Ay, A2,..., Ap. Then {v1, V2,..., Ve} is linearly independent. 


Proof By contradiction, we assume that {v1,Vv2,...,vx%} is linearly dependent. 
Since an eigenvector is nonzero by definition, {v1} is linearly independent. Without 
loss of generality, let r be the largest integer such that 


{V1, V2, en Vr} 
is linearly independent. Then we have 1 < r < k. Moreover, {vi,v2,°-+ ,Vr+1} is 
linearly dependent. Thus, there are scalars c1, C2,...,Cr+1, not all zero, such that 
C1 V1 + C2V2 +- + Cr+1Vr+1 = 0. (6.2) 


Multiplying both sides of (6.2) by A and using 
Avı = À1V1, ÁV2 = À2V2;, e, ÁVr41 = Àr41Vr+1, 


we deduce 

C1 À1V1 + C2A2V2 ++ Cr41Ar+1Vr+1 = 0. (6.3) 
Multiplying both sides of (6.2) by A,41 and subtracting the resulting equation from 
(6.3) yields 


€1(A1 — Arti)Vv1 + €2(A2 — Arpi)V2 + +++ + Cr(Ar — Arti) Vr = O. 
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Since {v1, V2,..., Vr} is linearly independent, this equation implies that 
ci (Àr = Apyi) = C2(A2 — Arya) = 20 = Cr (Ar — Avg) = 0. 
Since Ay, A2,.--,Ar41 are distinct, we have 
Cy = Cg =: =e, = 0. (6.4) 


Substituting these values into (6.2) yields 
Cr+1Vr+1 = 0. 
Note that the eigenvector v,41 is nonzero and it follows that 
Cro1 = 0. (6.5) 


Equations (6.4) and (6.5) contradict the fact that c1,c2,...,¢p41 are not all zero. 


The proof is completed. 
As a result of the theorem above, we have the following important theorem. 


Theorem 6.5 Let an nxn matrix A have n distinct eigenvalues. Then A is 
diagonalizable. 


Proof Since A has n distinct eigenvalues, by Theorem 6.4, we know that there are 
n linearly independent eigenvectors of A. It follows from Theorem 6.3 that A is 


diagonalizable. 


6.3 Orthogonal Diagonalization 


In this section, we focus on another problem of finding an orthonormal basis that 
consists of eigenvectors of a square matrix. Equivalently, we study a diagonalization 
employing an orthogonal matrix. Given A € R”*”, does there exist an orthogonal 
matrix P such that 


P'AP = PT AP = diag(\1, A2,---, An)? 


Such kind of problem is called the orthogonal diagonalization problem. A square 
matrix A is called orthogonally diagonalizable if there is an orthogonal matrix 
P such that P? AP is a diagonal matrix. 


Theorem 6.6 For ann xn matrix A, the following are equivalent. 


(a) A is orthogonally diagonalizable. 


(b) A has an orthonormal set of n eigenvectors. 
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(c) A is symmetric. 


Proof (a) = (b): Since A is orthogonally diagonalizable, there is an orthogonal 
matrix P such that P~!AP is diagonal. As shown in the proof of Theorem 6.3, the 
n column vectors of P are eigenvectors of A. Since P is orthogonal, by Theorem 5.18 
these column vectors are orthonormal. Hence A has n orthonormal eigenvectors. 


(b) = (a): Assume that A has n orthonormal eigenvectors p1, p2,...,Pn; Le., 
Ap; = AjP;; ISJ Sn. 
Construct a matrix with p1, P2,- -.,Pn as column vectors: 


P=| pı | p | Pn 


We then have 

AP = PD, 
where P is orthogonal and D = diag(A1,A2,.--,An). Thus, P~!AP is a diagonal 
matrix, i.e., A is orthogonally diagonalizable. 


(a) = (c): Since A is orthogonally diagonalizable, there exists an orthogonal matrix 
P such that 
PTAP =D, 


where D is a diagonal matrix. It implies 
A= PDP". 
It follows that 
AT = (PDP)? = PDT P? = PDPT = A. 
Thus, A is symmetric. 


(c) = (a): We prove this by using induction. If A is a 1 x 1 matrix, then (c) 
obviously implies (a). Suppose that all (n — 1) x (n — 1) symmetric matrices can be 
orthogonally diagonalizable. Now we consider a symmetric matrix A € R”*”. Let A 
be an eigenvalue of A and v be the corresponding eigenvector. Since v is a nonzero 
vector, we construct a unit vector 


v= 
Iivi 
Then v’ is an eigenvector of A with ||v’|| = 1. We can always find vectors 
¥1,Y2;--+;¥n—1 together with v’ to form an orthonormal basis for R”. Construct a 
matrix with y1, y2,.--,¥n—1 as column vectors: 


¥=l|yi} ya fe | yn | eR. 
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We then have 


yt 
T Yiy1 yiy2 > yiyna 
y2 T T oA: T 
T ane j i i y2 Yı y2 Y2 Y2 Yn-1 
Y Y= [Y1 Y2] t | Yn-1] = n A ; 
oe Yn-1¥1 Yn-192 aaa Yn-1¥n-1 
yi 
— tn-1 
and 
v TY — [ vy vya ws ! vyna ] = oF E€ Rix(-)), (6.6) 


Note that the matrix YT AY € R(—-)*(—) is symmetric. Then by the inductive 
hypothesis there exists an orthogonal matrix P € R™-)*@-) such that 


PTYT AYP =D (6.7) 
is diagonal. Constructing 
B=|v YP] eR”, 
we therefore have by using (6.6), 


Ve yTy yv TY P 
~ | (VTYP)T (YP)TYP 


T 
SER ATS 
0 In-1 


Thus, B is orthogonal. By (6.6) and (6.7), we obtain 


vT AY P = (AT v’')"YP = (Av’) "YP = àv TY P = 07 (6.8) 
and 
(YP)TAYP = Ply TAY PSD. (6.9) 
Finally, it follows from (6.8) and (6.9) that 
BrAB= v'T Av! vT AYP O vT Av OF 
© | (WTAYP)T (YP)TAYP | | 0 D |’ 


i.e., BTAB is diagonal. The proof is completed. 


Before we develop a procedure of orthogonal diagonalization, we need the 
following important theorem about eigenvalues and eigenvectors of symmetric 
matrices. We defer the proof of the theorem until Chapter 8 (see Theorem 8.12). 
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Theorem 6.7 Let A be a symmetric matrix. Then 


(a) The eigenvalues of A are all real. 
(b) Eigenvectors from different eigenspaces are orthogonal. 


Therefore, combining the above theorems, we provide the following method for 
diagonalizing a symmetric matrix A. 


(1) Find a basis for each eigenspace of A. 


(2) Apply the Gram-Schmidt process to each of these bases to obtain an orthonor- 
mal basis for each eigenspace. 


(3) Find the matrix P whose columns are the basis vectors constructed in (2). 
Then P? AP is diagonal. 


Example Find a matrix P that orthogonally diagonalizes 


0 -1 1 
A= | -l 0 -l 
1 —1 0 


Solution The characteristic equation of A is 


det(AI — A) = (A + 1)? (A — 2) = 0, 


and the distinct eigenvalues are \ = —1 and A = 2. The following are bases for the 
eigenspaces: 
—1 1 
A=-1, Xı = 1 > X2 = 0 ; A= 2, X3 = —1 
0 1 1 


There are three basis vectors in total. Applying the Gram-Schmidt process to each 
of these bases, we have the following bases for each eigenspaces: 


A=-l, pı = 
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Construct an orthogonal matrix 


Y 
II 
F 
: 
2 
Se 
N 
Sesli- 
S|-S/-S1- 


It is easy to verify 


PT AP = diag(—1,—1, 2). 


6.4 Jordan Decomposition Theorem 


The following theorem is fundamental and useful in linear algebra. 


Theorem 6.8 (Jordan Decomposition Theorem) Let A be any n x n matrix. Then 


there exists an invertible matrix X such that 


J, 0 0 0 
0 Jo O 
XAX=J:=| 9 0 Jy o ls 
— 
0 0 0. J 


where J; is an ni xX n; matrix for 1 <i <p given by 


Ag T WD 0 
0 à 1 

Js=]| 0 he "eed 
SA 
0 0o 0 Dy 


with A; (1 < i < p) being the eigenvalues of A and nı + na +: +np =n. The 
matriz J is called the Jordan canonical form of A and J; (1 < i <S p) are called 
Jordan blocks. The Jordan canonical form of A is unique up to the permutation 
of diagonal Jordan blocks. 


The proof of the Jordan decomposition theorem is beyond the scope of this text. 
We refer the interested readers to [14, pp. 164-171]. 
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Remark [It is well-known that there exist some square matrices which are not 
diagonalizable. However, the Jordan decomposition theorem tells us that for every 
square matrix A, there exists an invertible matrix X such that X~!AX is a bi- 
diagonal matrix. 


Example 1 Let 


be the Jordan canonical form of a 6 x 6 matrix A. Then the Jordan blocks are 
1 
po 3 0 
J = ; J2 = [4], J3=]|]0 3 1 
0 2 
0 0 8 
Moreover, from the Jordan canonical form of A, we find that A has three distinct 


eigenvalues A = 2,A = 4, and A= 3. 


Example 2 Let 


l1 —3 -2 
A= | -l 1 —1 
2 4 5 


Is it diagonalizable? Find its Jordan canonical form. 
Solution The characteristic equation of A is 
det(AI — A) = (A— 2)?(A — 3) = 0. 
Therefore, the distinct eigenvalues and the corresponding eigenvectors of A are 
A=2, pi =[-1,-1,2]7; à=3, pe =[-1,0,1]7. 


Since A is a 3 x 3 matrix and has only two linearly independent eigenvectors, A is 
not diagonalizable. However, we can find an invertible matrix 
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such that 
21!0 
X ™tAXx=|0 2 | 0 
00! 3 


is a bi-diagonal matrix (the Jordan canonical form of A). 


Exercises 


Elementary exercises 


6.1 Find the eigenvalues and bases for the eigenspaces of the following matrices. 


1 -2 
14 501 1 000 
o las] mM) 128}. Aaaa 
0 0 0 2 

6.2 For a positive integer k > 2, compute 
k k 010]° 010]° 
2 1 A 1 
oF 3]. m[aa]- © 0 0 1 (d) 0 0 
0 0 0 1 0 0 


6.3 Find the eigenvalues and bases for the eigenspaces of AÌ, where 


=1 2 = 2, 
A= 1 2 1 
—1 -l 


6.4 Suppose that 


(a) Find Alb. 
(b) Find A?° and check whether your solution of (a) is true or not. 
6.5 Find a matrix A € R°*? such that 
Au, = uj, Auz = 2ug, Aug = 3u3, 


where uy = [1,2,2]7, ue = [2, —2, 1], and us = [—2, —1, 2)”. 
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6.6 Show that if 


0 
A=]a 
1 


O = © 


1 
b 
0 
has three linearly independent eigenvectors, then a + b = 0. 


6.7 Show that if À is an eigenvalue of an invertible matrix A and x is a corresponding 
eigenvector, then 1/) is an eigenvalue of A~! and x is a corresponding eigenvector. 


6.8 Show that if \ is an eigenvalue of a matrix A, x is a corresponding eigenvector, 
and a is a scalar, then \ — a is an eigenvalue of A — al and x is a corresponding 
eigenvector. 


6.9 Show that if À is an eigenvalue of an invertible matrix A and x is a corresponding 
eigenvector, then det(A)/A is an eigenvalue of adj(A) and x is a corresponding 
eigenvector. 


6.10 Let 
21 1 
A=|1 3 2 
1 2 4 
Find the eigenvalues and bases for the eigenspaces of A, A~!, A — 2I, and A+ 3I, 


where I is the 3 x 3 identity matrix. 


6.11 Let A € R"*”. Show that if ÀA is an eigenvalue of A, then A? + 3A? — 2A +5 
is an eigenvalue of the matrix AÌ + 3A? — 2A + 5I, where I is the n x n identity 
matrix. 


6.12 Let 
2 0 1 
A=|3 1a 
4 0 5 


Find the value of a such that A is diagonalizable. 


6.13 Let A € R”*” with A” = 0 for some m > 1. Show that if A is diagonalizable, 
then A must be the zero matrix. 


6.14 Determine whether A is diagonalizable. If so, find an invertible matrix P that 
diagonalizes A, and determine P~1 AP. 


af 4 A 1 2 2 PIR 

(a) A=| -3 4 0 (b) A=] 2 1 2 (c) A=| , he sae 
yak AS 2 2 1 

0 00 3 
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6.15 Let A be a diagonalizable matrix. 


(a) Show that AT is also a diagonalizable matrix. 


b) Show that if A is invertible, then AT! is diagonalizable. 
g 


6.16 Find A? and Aĉ, where 


6.17 Show that if A is a symmetric matrix, then all eigenvalues of A are nonnegative 
if and only if there exists a symmetric matrix B such that A = B?. 


6.18 Find a matrix P that orthogonally diagonalizes each of the following matrices. 


-2 0 -36 
6 —2 

(a) ; “a (b) 0 -3 0 

-36 0 -23 


6.19 Ifb #0, find a matrix P that orthogonally diagonalizes 


a b 
b al 


6.20 Let A, B € R”*” be two orthogonally diagonalizable matrices. 


A= 


(a) Show that A + B is orthogonally diagonalizable. 


(b) Show that if AB = BA, then AB is orthogonally diagonalizable. 


6.21 Show that if v is any n x 1 matrix and J is the n x n identity matrix, then 
I — vv" is orthogonally diagonalizable. 


Challenge exercises 


6.22 Find det(A) given that A has p(A) as its characteristic polynomial. 
(a) p(A) = A8 4+247-A+4. (b) p(A) = àf +38 +6. 
6.23 Show that the characteristic equation of A € IR?*? can be expressed as 


A? — tr(A)A + det(A) = 0. 
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6.24 Let A € R"*”. Show that A and AT have the same eigenvalues and may not 
have the same eigenspaces. 


6.25 Let 


Show that for any positive integer k > 2, 

tr(A*) = tr( AF71) + tr(A**). 
6.26 Let A; and àz be two distinct eigenvalues of a matrix A, and let vı and v2 
be eigenvectors of A corresponding to A; and Ag, respectively. Show that vı + və is 
not an eigenvector of A. 
6.27 Let A; and àz be two distinct eigenvalues of a matrix A, and let u1, ug,..., Us 
and v1, V2,...,V¢ be linearly independent eigenvectors of A corresponding to Aj 
and A2, respectively. Show that the set {uj,u2,...,Us,V1,V2,---, Ve} is linearly 
independent. 
6.28 Let A=uv’, where u = [1,2,3]T and v = [4,5,6]”. Find A”, where n is an 
integer and n > 1. 
6.29 Show that if A is diagonalizable, then rank(A) is equal to the number of 
nonzero eigenvalues of A. 


6.30 Let En = [eij] € R”*”, where e;; = 1 for all i,j. Find the eigenvalues and 
corresponding eigenvectors of 


En 0 
6.31 Let A € R”*” and B € R”*™, where m > n. Show that 
det(AIm — AB) = A°°~" . det(AIn — BA). 


6.32 Let u,v € R” be nonzero column vectors orthogonal to each other. Find all 
eigenvalues of A = uv? and corresponding eigenvectors. 


6.33 Prove the Cayley-Hamilton theorem [14, pp. 109-111]: If A € R”*” with 
characteristic equation 


ES a pairie C1À co = 0, 


where ¢o,C1,---;Cn—1 € R, then 


A” + Cn A"! tose cA Tr col = 0, 


where I is the n x n identity matrix. 


Chapter 7 


Linear Transformations 


“We do not need magic to transform our world.” 


— Joanne Rowling 


“ Mathematics compares the most diverse phenomena and discovers the secret analogies 
that unite them.” 
— Joseph Fourier 


In Chapter 3, we introduced linear transformations from R” to R™. In this chapter, 
we will study linear transformations between general vector spaces. The results 
obtained here many important applications in science and engineering. 


7.1 General Linear Transformations 


In Section 3.2, we defined and studied linear transformations from R” to R”. In this 
section, we will define and study the more general concept of a linear transformation 
from a general vector space to another. 


7.1.1 Introduction to linear transformations 


By inspection of Theorem 3.7 about the linearity conditions of linear transformations 
from R” to R”, we will use these conditions as the starting point to define general 


linear transformations. 


Definition Let T: V > W be a function from a vector space V to a vector space 
W. Then T is called a linear transformation from V to W if for all vectors u 


and v in V and all scalars k: 
(i) T(u + v) = T(u) + T(v). 


(ii) T(ku) = kT (u). 


7.1 
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Examples 


(a) 


(e) 
s 


Let p = p(x) = co +14 +-+- + cnx” be a polynomial in P,, and define the 
function T: Pa > Pa+1 by 


T(p) = T (p(£)) = xp(x) = Cox + ar? Pernt Carn, 
For any polynomials p1, P2 € P, and any scalar k, we have 


T (pi (2) + kpo(«)) = x(pı(x) + kpa(x)) 
= zpı(x) + kap2(x) = T(pi) + kT (po). 


T(pi + kp2) 


Thus, T is a linear transformation. 


Let V be an inner product space and vo € V be any fixed vector. Let T: VR 
be the transformation that maps a vector v into its inner product with vo, i.e., 


T(v) = (v, Vo). 


From the axioms of an inner product, we have for any u,v € V and any scalar 
k 


H 


T(u + kv) = (u + kv, vo) = (u, vo) + k(v, vo) = T (u) + kT (v). 
Hence T is a linear transformation. 


Consider the trace defined on R"*”. For A = [a;;j], B = [bij] € R”*” and any 
scalar k, we have 


tr(A + kB) = S~ (aii + kbi) = 2 ait k 2 bii = tr(A) + ktr(B). 
i=l 


Thus, tr(-) is a linear transformation. 


Let V = C1(—o0, 00) be the vector space of all functions with continuous first 
derivatives on (—oo, 00) and W = C(—co, oc) be the vector space of continuous 
functions defined on (—co,oo). Let D: V — W be the transformation that 
maps a function f = f(x) into its derivative, i.e., 


De) = f(a) = 42. 


It follows from the properties of differentiation that for any f = f(x),g = 
g(x) € V and any scalar k, 


D(f + kg) = (f(x) + kg(2))' = f'(x) + kg' (£) = D(f) + kD(g). 


Thus, D is a linear transformation. 
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(e) Let V = C(-—co,0o) be the vector space of all continuous functions on 
(—oo, 00), W = C!(—00, 00) be the vector space of functions with continuous 
first derivatives on (—o00, 00), and J: V — W be the transformation that maps 
f = f(x) into its integral, i.e., 


GEE f ” p(b)dt. 


It follows from the properties of integration that for any f = f(x), g = g(a) € V 
and any scalar k, 


x 


eae f VORKEN f MORET f g(t)dt = J(f) + kJ (g). 


Hence J is a linear transformation. 


The following theorem lists some basic properties that hold for all linear 
transformations. 


Theorem 7.1 Let T: V > W be a linear transformation. Then 


(b) T(-—v) = -T (v) for all v in V. 


(c) T(v — w) = T(v) — T(w) for all v and w in V. 
(d) HS kiva) = 5 kiT (v;i) for all v; in V and all k; in R (1<i<n). 
i=1 i=1 


Proof For (a), we have 


T(0) = T(0u) = 0T (u) = 0. 


One can prove the remaining parts easily. 


Let T: V > W bea linear transformation and {v1, V2,...,Vn} be any basis for 
V. Then for any vector v € V, T(v), the image of v under T, can be calculated 
from T(v1), T(v2),..-,T(Vn), which are the images of the basis vectors. In fact, let 


V = C1 V1 + C2V2 +: F CnVn. 
It follows from Theorem 7.1 (d) that 


T(v) = aT(v1) + coT (v2) + -+ nT (vn). 
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7.1.2 Compositions 


In Subsection 3.2.3, we defined the composition of linear transformations on the 
Euclidean vector spaces. We extend that concept to general linear transformations. 


Definition Let T);:U—V and T2:V—->W be linear transformations. The composi- 
tion of Tz with Tı, denoted by T2°T}, is the function defined for any vector u in U 
by the formula 

(Tə 0T,)(u) = To (Tı (u)). 


The following theorem shows that the composition of two linear transformations 
is still a linear transformation. 


Theorem 7.2 Let Ti: U >V and To: V > W be linear transformations. Then 
T20T,;: U > W 
is also a linear transformation. 


Proof Since T; and T are linear transformations, we have for any u,v € U and 
keR, 
(T2 0 Ti)(u + kv) = T2(Ti(u + kv)) = To (Tı(u) + kT; (v)) 
= To (Tı (u)) + kT» (Tı (v)) 
= (T> o Tı )(u) + k(T> o Ti )(v). 


Thus, Tə o T; is a linear transformation from U to W. 


Example Let Tı: Pı > P and To: Po —> P» be the linear transformations given 
by the formulas 


Ty(p(2)) = zp(z) and Ta(p(2)) = pBz + 2). 
Then the composition T> o Ti: Pi > Py» is given by the formula 
(Tz o T1) (p(x) = To (Tı (p(æ))) = T2(xp(a)) = (82 + 2)p(3x + 2). 
In particular, if p(x) = co + cız, then 
(Tz o T1)(p(x)) = (Tz o Ti) (co + crx) = (3x + 2) (co + c1 (3x + 2)) 
= (3x + 2) + ca (3x + 2). 


If T: V — V is any linear transformation and if I: V — V is the identity 
transformation, then for all vectors v € V, 


ToD) =T(Mv)) =T(v), (Le T)(v) = I(T) = T) 
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It follows that 
Tol=T, IloT=T. 


We conclude this section by noting that compositions can be defined for more 
than two linear transformations. For example, let Vo, Vi, V2, and V3 be vector 
spaces. If Ti: Vo > Vi, To: Vi > Vo, and T3: V2 > V3 are linear transformations, 
then the composition T3 o T> o T, defined by 


(T3 0 To o T1)(v) = T3(T2(Ti(v))) 


for v € Vo, is a linear transformation from Vo to V3. See Figure 7.1. 


T30T,0T, 


Figure 7.1 


In general, if T}; is a linear transformation from the vector space V;_; to another 
vector space Vj for 1 < j < n, then the composition Tn o T,_1 0+ +- o Tz 07}, defined 
by 

(Tn © Ta—1 0+4 0 To 0 Ti )(v) = Ta (Ta-1 ++ (T2(Ti(v)))) 


for v € Vo, is a linear transformation from Vo to Vpn. 


7.2 Kernel and Range 


In this section, we develop some fundamental properties of linear transformations. 


7.2.1 Kernel and range 


Definition LetT: V > W be a linear transformation. Then the set of vectors in 
V that T maps into O is called the kernel of T, denoted by ker(T). The set of all 
vectors in W that are images under T of at least one vector in V is called the range 
of T, denoted by R(T). 


Note that ker(T) C V and R(T) C W. 
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Examples 
(a) If Ta: R” > R” is defined by T4(x) = Ax, where A is an m x n matrix and 
x € R”, then ker(T4) is the nullspace of A, and R(T4) is the column space of 
A. 
(b) Let T: R3 — R? be the orthogonal projection on the ry-plane. The kernel of 


T is the set of points that T maps into 0 = [0,0,0]. These are the points on the 
z-axis (Figure 7.2). Since T maps every point in R3 into ry-plane, the range 
of T must be in this plane. However, every point [xo, yo, 0] in the xy-plane is 


the image under T of some points. Actually, it is the image of all points on 
the vertical line that passes through [29, yo, 0] (Figure 7.3). Thus, R(T) is the 
entire xy-plane. 


[%, Yo: 2] 


fr 

[%; v 0] 
T T 
Figure 7.2 ker(T) is the z-axis Figure 7.3. R(T) is the entire zy-plane 
Let V = C'la,b] be the vector space of functions with continuous first 


derivatives on [a, b] and W = C{a, b] be the vector space of continuous functions 
on [a,b]. Let D: V > W be the differentiation transformation 
_ Yz) 


Df) = fle) =, 


where f = f(x) € V. The kernel of D is the set of functions in V with 
derivative zero. From calculus, this is the set of all constant functions on [a, b]. 
The range of D is given by R(D) = W = C[a, b]. 


In all the examples above, ker(T) and R(T) turned out to be subspaces. This is 


actually a consequence of the following result. 


Theorem 7.3 Let T: V > W be a linear transformation. Then 


(a) 


The kernel of T is a subspace of V. 
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(b) The range of T is a subspace of W. 


Proof For (a), let u,v € ker(T) and k € R. We have 
T(u + kv) = T(u) + kT(v) = 0 + k0 = 0. 
Thus, u + kv € ker(T), i.e., ker(T) is a subspace by Theorem 4.2. 


For (b), let u,v € R(T) and k € R. We know that there are p,q € V such that 
T(p)=u, T(ą)=v. 


It follows that 

T(p + kq) = T(p) + kT (q) = u + kv, 
where p + kq € V. Thus, u + kv € R(T), i.e., R(T) is a subspace by Theorem 4.2 
again. 


7.2.2 Rank and nullity 


Definition LetT: V > W be a linear transformation. Then the dimension of the 
range of T is called the rank of T and is denoted by rank(T); the dimension of the 
kernel is called the nullity of T and is denoted by nullity(T). 


Let Ta: R” > R™ be multiplication by A € R™*”. Then we have the following 
relationship between the rank and nullity of the matrix A and the rank and nullity 
of the corresponding linear transformation T4. 


Theorem 7.4 Let A be anmxn matrix and T4 be the matrix transformation from 
R” to R™. Then 


(a) nullity(T4) = nullity(A). 
(b) rank(T4) = rank(A). 
Proof For (a), we have 
nullity(T4) = dim (ker(T4)) = dim(nullspace of A) = nullity(A). 


For (b), we have 


rank(T4) = dim(R(T4)) = dim(column space of A) = rank( A). 


7.2.3 Dimension theorem for linear transformations 


Theorem 7.5 (Dimension Theorem for Linear Transformations) Let T: V > W 
be a linear transformation from an n-dimensional vector space V to a vector space 
W. Then 

rank(T) + nullity(T) = n. 
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Proof We only prove the case of 1 < dim(ker(T)) < n. The proofs of cases of 
dim(ker(T)) = 0 and dim(ker(T)) = n are left as an exercise. We must show that 


dim(R(T)) + dim(ker(T)) = n. 


Assume dim(ker(Z’)) = r, and let 


{V1, V2, sait Vr} 
be a basis for the kernel. Since {v1, V2,...,Vr} is linearly independent, Theorem 
4.12 (b) states that there are n — r vectors Vr+1,...,Vn such that 
{V1, V2, soy Vrs Vr+15--- , Vn} 


is a basis for V. We want to show that the n—r vectors in S = {T(vr41),...,T (vn) } 
form a basis for R(T). 


First, we show that S spans R(T). If w is any vector in R(T), then w = T(v) 


for some vector v in V. Since {v1,V2,.--,Vr;Vr41,---,Vn} is a basis for V, we have 
V = C1 V1 + CVa Hi E CpVp + Cp4 1 Vp Hee + CnVn- 

Since vj, V2,..., Vr lie in ker(T), we obtain T(v1) = T(v2) = -:- = T(v,) = 0, so 

that 


w =T(v) = Crgil (Verdi) + tent (vn). 
Thus, R(T) = span(S). 
We next show that S is a linearly independent set. Suppose that 
kr+1T(Vr41) + + knT (vn) = 0. (7.1) 
Since T is linear, (7.1) can be rewritten as 
T(kr+1Vr+1 +: + knvn) = 0, 


which says that kr+1Vr+1 +: ++knVn € ker(T). This vector can therefore be written 
as a linear combination of the basis vectors {v1,V2,...,Vv,}, say 


Kr4iaVr+1 sees ae knVn = kıvı a eee krVr. 
Thus, 
kıvı Spy ep krVr as kr41Vr+1 Pt ee kinVn = 0. 


Since {V1,V2,---,Vr,;Vr+1,---,Vn} is linearly independent, all of the k’s are zero. 
In particular, 
kaa = +++ = kn = 0. 
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Hence S = {T(v,41),---,7(Wn)} is linearly independent. Consequently, S forms a 
basis for R(T). Therefore, 


dim(R(T)) + dim(ker(T)) = (n—r)+r=n. 


Remark In Theorem 7.5, if T = T4 is a matrix transformation from R” to R”, 
where A is an m x n matrix, then it follows from Theorem 7.4 that 


rank(A) + nullity(A) = n. 


Thus, Theorem 4.23 actually is a special case of Theorem 7.5. 


7.3 Inverse Linear Transformations 


In Subsection 3.3.3, we discussed some properties of one-to-one linear transformations 
from R” to R”. In this section, we extend those ideas to general linear transforma- 
tions. 


7.3.1 One-to-one and onto linear transformations 


Definition A linear transformation T: V >W is said to be one-to-one if T maps 
distinct vectors in V into distinct vectors in W, i.e., for any vectors u and v in V, 


ifu Æ v, then T(u) 4 T(v). 


Definition A linear transformation T: V — W is said to be onto if every vector 
in W is the image of at least one vector in V, i.e., for every vector w in W, there 


is a vector v in V such that T(v) = w. 


The following theorem establishes a relationship between a one-to-one linear 
transformation and its kernel. 


Theorem 7.6 Let T: V — W be a linear transformation. Then the following are 


equivalent. 
(a) T is one-to-one. 
(b) ker(T) = {0}. 
Proof (a) = (b): Let u € V. By (a), if u 40, then 
T(u) # T(0) = 0, 


i.e., u ¢ ker(T). Therefore, ker(T) = {0}. 
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(b) > (a): If u Æ v, then u— v 4 0. Hence u — v is not in ker(T) by (b). We obtain 
T(u) —T(v) = T(u—v) £0, 


i.e., 


T(u) 4 T(v). 


Thus, T is one-to-one. 


Furthermore, if the vector spaces V and W have the same dimension, then the 
following theorem shows one more equivalent property. The proof of the theorem is 
left as an exercise. 


Theorem 7.7 Let V and W be finite-dimensional vector spaces with the same 
dimension, and T: V > W be a linear transformation. Then the following are 
equivalent. 


(a) T is one-to-one. 

(b) ker(T) = {0}. 

(c) R(T) = W, i.e., T is onto. 
Example In each part, determine whether the linear transformation is one-to-one, 
onto, both, or neither. 

(a) T: R? > R? rotates each vector through the angle 8. 

(b) T: R? > R? is the orthogonal projection on the ry-plane. 


(c) Ta: R* > R? is multiplication by the matrix 


Solution For (a), note that ker(T) = {0}, and then T is both one-to-one and onto. 


For (b), since ker(T) is the z-axis which contains nonzero vectors, T is neither one- 
to-one nor onto. 


For (c), note that rank(A) = 3 and nullity(A) = 1. Since dim(ker(T4)) = 
nullity(A) = 1, i.e., ker(Z4) 4 {0}, it follows from Theorem 7.6 that T4 is not 
one-to-one. However, since rank(A) = 3, it follows from Theorem 4.26 that the 
linear system Ax = b is consistent for every vector b € R3. Thus, R(T4) = R8, 
i.e., T4 is onto. 
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7.3.2 Inverse linear transformations 


The inverse transformation of a one-to-one transformation T: V — W, denoted by 
T-t, is defined as a new function that maps w = T(v) € R(T) C W back into v for 
any v € V. See Figure 7.4. 


T 
ff prs / 
V T- R(T) 
Figure 7.4 


We now show that T~': R(T) > V is a linear transformation. Note that from the 
definition of T-t, 


T`! oT = Íy, Tot = laa; (7.2) 


where Jy is the identity transformation on V and Ip,7) is the identity transformation 
on R(T). Thus, for any u, w € R(T), we deduce by using (7.2), 


T-\(u+w) =T-[(PoT-!)(u) + (T o T7) (w)] = THTT) + T(r (w))] 
= T-HT (T (u) + T7! (w))] = (To T)[T (u) + T- (w)] 
= T- (u) +T-\(w), 


and for any scalar k, 


T=! (kw) = T~'[k(T o T7) (w)] = T [k(T(T = (w)))] 
= T™'|T(kT~*(w))] = (T~ o T)[kT~*(w)] 
= kT! (w). 


Hence T7! is a linear transformation. 


The following theorem lists an important property of one-to-one linear transform- 
ations. 


Theorem7.8 LetTi: U > V andTo: V > W be one-to-one linear transformations. 
Then To o T; is one-to-one. 
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Proof Since both T; and T> are one-to-one linear transformations, for vectors u 
and v in U and u¥ v, we have 


Tı (u) # Ti (v), 
where T\(u) and T; (v) are vectors in V. Moreover, 


(Tz 0 T,)(u) = T2(T(u)) # T2(Ti(v)) = (T> 0T1)(v), 


where (T207;)(u) and (T20T1)(v) are vectors in W. Thus, Tz 07; is one-to-one. 


Remark In general, if Tj is a one-to-one linear transformation from the vector 
space Vj—ı to another vector space V; for 1 <j < n, then Th o T,_10---0 To 0 T; is 
one-to-one. 


7.4 Matrices of General Linear Transformations 


In this section, we show that if V and W are finite-dimensional vector spaces, then 
by using bases for V and W, any linear transformation T: V — W can be regarded 
as a matrix transformation. 


7.4.1 Matrices of linear transformations 


Let T be a linear transformation between two finite-dimensional vector spaces V 
and W with dim(V) = n and dim(W) = m, respectively. We have the following 
relationship. See Figure 7.5. 


AvectorinV x% —» T(x) A vector in W 
A vector in R" [x], [T(x)]z A vector in R” 
Figure 7.5 


Here [x] is the coordinate vector of x relative to a basis B for V and [T(x)]p: is 
the coordinate vector of T(x) relative to a basis B’ for W. In the following, we show 
that there exists a matrix A such that 


Alx]p = [T(x)]z°. (7.3) 


See Figure 7.6. 
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T 
x ——> Th) 


| ! 


k ————> [T)] 


Figure 7.6 


We are now going to construct A. Let 
B = {uj,ug,...,Un} C V, B’ = {v1, V2; ..., Vm} C W. 
Note that 
V = span{uj, uU2,... , Un}, W = span{v1, V2;...;, Vm}. 


Since B’ = {v1,V2,...,Vm} is a basis for W and T(u,) € W for 1 < j < n, we have 


T(u;) =X kivi = [V1,V2,---;Vm| 
i=l : 
kmj 
Therefore, 
[Tule = [kiz kaj,- kmj]” € R”. (7.4) 
It implies 
kı kız2 © kin 
kə) kə ++ kon 
[T(u,), T(u2),..., T(un)| = [Vi, v2,---, Vm] . . l . (7.5) 
kmı km2 p Krinn 
Let 
Tı 
n T2 
x= Ñ riu; = fui, u2,- , Un] > EV. 
i=1 : 
Tn 
Then 
[x]s = [z£1, £2,... Enl” € R”. 


It follows from (7.5) that 


T(x) =T( X ru) = X Tu) = [T(u), Tu), .--,7(n)] 


Tn 
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k11 kı2 Kin Tı 
kay hog +++ kon T2 

= [V1,V2,---,Vm] ; ‘ s j E W. (7.6) 
kmi km2 aF kmn Tn 


Thus, we have by (7.6) and (7.4), 
[Tele = kalks = | Ta) | Pae | i Pale |e (7-7) 
Comparing (7.7) with (7.3), we therefore obtain 
A=| [Pale | [Ta] | Tn) | ER”, 


where A is called the matrix for T relative to the bases B and B’ and is denoted by 
[T]B’,p usually. Furthermore, (7.5) can be written as 


[T (u1), T(u2),...,7(un)] = [v1; V2; . -- , Vm] [T] B B. (7.8) 


Remark When V = W, it is usual to take B’ = B when constructing a matrix for 
T. In this case the resulting matrix is called the matrix for T relative to the basis 
B and is usually denoted by [T] s rather than |[T]g, B. If B = {u1, u2, ... , Un }, then 
in this case we obtain 


[T]z = | [T(u)]s | [T(u2)]s i i Fn) (7.9) 


and 
[T]e[x]s = [T(x)]s. (7.10) 


Phrased informally, (7.9) and (7.10) state that the matrix for T times the coordinate 
vector for x is the coordinate vector for T(x). 


Example 1 Let T: P, —> P> be the linear transformation defined by 
T(p(2)) = xp(2). 
Find the matrix for T relative to the bases 
B = {u;, u2}, B' = {v1, v2, V3}, 


where 


u =l, ug = T; vi =l, V2 = T, V3 =T. 
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Solution We have by (7.8), 


II 


0 0 
[T(u1), T(u2)] = [T1), T(2)] = [x, 2°] = [12,27] | 1 0 
0 1 


[T] s’, B 


Example 2 Let T: Po > P> be the linear transformation defined by 


T(p(2)) = px +1), 


i.e., T(co + cz + C227) = co + 1 (2x + 1) + c2(2x + 1). 


(a) Find [T]p relative to the basis B = {1, x, 27}. 
(b) Compute T(2 + 3x + 7x?) by using (7.10). 
(c) Check the result in (b) by computing T(2 + 3x + 7x?) directly. 
Solution For (a), we have from the definition of T, 
T(1)=1, Pla)=2e4+1, T(x”) = (22 +1)? = 42? +42 +1. 


It follows from (7.8) that 


1 1 1 
[T(1), T(x), T(z)] = [1, 2x + 1,42? + 4r +1] =[1,2,27]| 0 2 4 
00 4 


For (b), the coordinate vector relative to B of the vector p = 2 + 3x + 72? is 


2 
[p] =| 3 


Thus, we have by using (7.10), 


1 1 1 2 12 
[T(p)]e = [T]s[p]s =| 0 2 4 3 | =| 34 
00 4 7 28 


It follows that 
12 
T(2 + 32+ 7x7) = [1,z, x°] | 34 | = 12 + 342 + 2827. 
28 


7.4 Matrices of General Linear Transformations 181 


For (c), we have by direct computation, 


T(2+ 3a + 7x”) = 2+ 3(22 +1) + 7(2z +1)? = 12 + 34a + 2827, 


which agrees with the result in (b). 


7.4.2 Matrices of compositions and inverse transformations 


The following theorem is a generalization of (3.5) in Subsection 3.2.3. The proof of 
the theorem is left as an exercise. 


Theorem 7.9 Let T,:Vo>V, and To: Vi > V2 be linear transformations, and let Bo, 
Bı, and Bə be bases for Vo, Vi, and V2, respectively. Then 


[Tə Q Tı]B2,B0 = [T2] BoB [Ti] B,,Bo- (7.11) 


Remark In (7.11), observe how the interior subscript Bı (the basis for the 
intermediate space V1) seems to “cancel out”, leaving only the bases for the domain 
and image space of the composition as subscripts 

[T> ? Ti]B2,Bo = [T2] B2,B, [Ti] B,,Bo- 


tT Tt 


Cancelation 


This cancelation of interior subscripts suggests the following extension of (7.11) to 
composition of three linear transformations. Let Bo, Bı, Bz, and B3 be bases for 
vector spaces Vo, Vi, V2, and V3, respectively, and Tj be a linear transformation 
from V;_; to Vj for j = 1,2,3. See Figure 7.7. 


Basis Bo Basis B; Basis By Basis B; 
Figure 7.7 
Therefore, 
[T3 o T20 Ti] Bs, Bo = [T3] Bs, Bo [T2] B3,B; [Ti] B,,Bo- 
In general, we have 
[Tn ° In—-19°+-9T2°Ti) 2,8) = [Tn] 8,,B,-1[Pn—-1)B,-1,Bn—2*** [T]B, B Mle ,Bo> 


where By is a basis for a vector space Vk for 0 < k < n, and T}; is a linear 
transformation from V;_; to Vj for 1 <j <n. 
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Theorem 7.10 Let T: V > V be a linear transformation and B be a basis for V. 
Then the following are equivalent. 

(a) T is one-to-one. 

(b) [T]s is invertible. Moreover, [T]}' = [T~"]z. 


Proof Note that T is one-to-one if and only if T~! exists and T~!oT = Iy, where 
Iv is the identity transformation on V. Then 


ToT=ly = (TF )JalT]e=(T 7 o T]s = [Iy] = 7, 


i.e., [T]g is an invertible matrix and [T],' = [T7}]B. 


7.5 Similarity 


The corresponding matrix for a linear transformation T: V — V relies on a basis 
we choose for V. Selecting an appropriate basis for V can simplify the matrix for T 
to be a diagonal or triangular matrix. We first introduce the following definition. 


Definition If A and B are square matrices, we say that B is similar to A if there 
is an invertible matriz P such that B = P~!AP. 


Theorem 7.11 Let B = {vj,vo,...,Vn} and B' = {w1,W2,...,Wn} be two bases 
for a vector space V, and let T be a linear transformation on V. Then |T]p is 
similar to |T]. More precisely, 


[Tle = P~'[T] BP, 
where P is the transition matriz from B to B'. 
Proof Let P = [pij] € R”*”. Since 
[W1,Wo,---,Wn] = [V1,V2,---,Vn]P, 


we have forl <j <n, 


Pij 
Piw) = T(X pavi) = P paT) = PODPO) A | 
Pnj 


which implies 


[T(w1), T(w2),...,T(wn)] = [T(v1),T (va), ---,T (vn) |B. (7.12) 


7.5 Similarity 


Since P is invertible by Theorem 5.21, we obtain 
[v1,V2,---)Vn] = [W1, W2; - - -, Wn] P7}. 
Furthermore, we have by (7.8), 
[T (v1), T(v2),--.,T(vn)] = [v1, V2; -+ - ; Vn][T]B 


and 
[T(w1), T(w2),..., T(wn)] = [wi, w2, ..., Wn] [T] B. 


It follows from (7.12), (7.14), and (7.13) that 


[T(w1), T(w2),...,T(wn)| = [T (v1), T(v2),..., T(vn)] P 


= [v1, Va,---,Vn][T]eP = [w1, W2, . - . , Wn] P- '[T] 5P. 
Comparing with (7.15), we deduce 


[Tle = P~'[T] 5P. 


183 


(7.13) 


(7.14) 


(7.15) 


Remark It follows from the Jordan decomposition theorem that every square 


matrix A is similar to a bi-diagonal matrix. Moreover, if A is symmetric, then A is 


similar to a diagonal matrix. 


Similar matrices always share some important properties and we list a few of 


them in Table 7.1. The proofs of the results in table are left as an exercise. See 


Exercise 7.23. 


Table 7.1 
Property Description 
Determinant A and P~!AP have the same determinant. 
Invertibility A is invertible if and only if P~!AP is invertible. 
Rank A and P~1AP have the same rank. 
Nullity A and P~!AP have the same nullity. 
Trace A and P-!AP have the same trace. 
Characteristic polynomial A and P~-!AP have the same characteristic polynomial. 
Eigenvalues A and P~!AP have the same eigenvalues. 
Eigenspace dimension If \ is an eigenvalue of A and P~!AP, then the 


eigenspace of A corresponding to À and the eigenspace 


of P-! AP corresponding to have the same dimension. 
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Remark Let M be an nxn invertible matrix. Define a transformation T: R"*" > 
R”*” by T(A) = MT!AM for all A in R"*”. It is easy to show that T(A), called 


the similarity transformation, is linear. 


Exercises 
Elementary exercises 
7.1 Show that none of the following transformations is linear. 

(a) T: IR?*? > R defined by T(A) = det(A). 

(b) T: R > R defined by T(x) = 2”. 

(c) T: R > R defined by T(x) = a+ 1. 
7.2 Consider the basis S = {v1, V2, v3} for R3, where 

vı = (lay. wee MO v3 = [1,0,0]". 
Let T: R — R? be the linear transformation such that 
T(vı)= [1,0],  T(və) = [2-1],  T(vs) = [4,3]". 
(a) Find a formula for T(x) for all x = [z1, 22,23]? € R2. 
(b) Use the formula in (a) to compute T(x) if x = [2, —3, 5]7. 


7.3 Suppose that T: R? — P, is the linear transformation such that 


r([1])<a-aee 2([2]) 1-2 
roir([-t]) mx((2]) 


7.4 Let V be an n-dimensional vector space and T: V > V be defined by 


T(v) = 2v. 
Find the kernel, range, rank, and nullity of T. 


7.5 Show that Theorem 7.5 holds in the cases of dim(ker(T)) = 
dim(ker(T)) = n. 


0 and 
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7.6 In each part, determine whether the linear transformation is one-to-one by 
finding the kernel or the nullity. 


o(s)-(22} os) Lets} 


u+y x 
x r-z 
or( |}- x |. OT|jy al | 
y y 
y z 
7.7 Let Ti: U > V and To: V > W be linear transformations. Show that if To o Ti 
is one-to-one, so is T}. 


c+y 
cT—Y 


7.8 Suppose that the linear transformations Tı: Po > P> and Th: Po > P; are 
given as follows: 


Ti(p(z)) =p(e@+1),  Ta(p(z)) = xp(z). 
Find (T> o T,)(ao + aiz + az”). 


7.9 Let T: P> —> Po be the linear transformation given by the formula T'(p(z)) = 
p(2x + 1). 


(a) Find a matrix for T relative to the basis B = {1, x, a7}. 

(b) Find the rank and nullity of T. 

(c) Use the result in (b) to determine whether T is one-to-one. 
7.10 Prove Theorem 7.7. 


7.11 Show that the linear transformation T: R? —> P, defined by 
a 
al |) -etero 


b 
7.12 If T4: R? — R? is defined by T4 (x) = Ax, then determine whether T4 has 


an inverse. 


is both one-to-one and onto. 
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7.13 Let T: R? — R? be defined by 


a 


and let B = {u;1, u2} and B’ = {v1, v2} be bases for R?, where 


zı + 3X9 
321 a 4x9 , 


u = [0,2]7, we =[2,-1]7; vi =[1,2]7?, ve =[-1,0]7. 


Find the matrix for T relative to the basis B, and find the matrix for T relative to 
the bases B and B’. 


7.14 Let T: R? — R? be the linear transformation defined by 


x — 2y 
r+y-—3z |’ 


and let B = {u1, u2, u3} and B’ = {v1, v2} be bases for R and R?, respectively, 


£ 
T y = 


z 


where 
u; = [1,0,0]7, u= [0,1,0], ug=[0,0,1)7; vı =[0,1]7, ve=([1,0]f. 
(a) Find the matrix for T relative to the bases B and B’. 
(b) Find [T(v)]e if [v]a = (1,3, -217. 


7.15 Let V be an n-dimensional vector space and J be the identity transformation 
on V. What is the matrix for I relative to two distinct bases B and B’ for V? 


7.16 Let T: Pı > P, be the linear transformation defined by 
T(p(x)) = p(z + 1), 
and let B = {p1, p2} and B’ = {q1, q2} be bases for P,, where 
pı =6+32, po=104+2%; qi =2, qo=3+2z2. 
(a) Find the matrix [T]g-,p relative to the bases B and B’. 


(b) If p = 1 + 3x, then find [T(p)]z by using the matrix [T] g; B. 


7.17 Let 
1 3 -1 
AS 0 5 
6 —2 4 


be the matrix for T: P> + P> relative to the basis B = {p1, p2, p3}, where 


pı = 3r +32?, p2=-—1+3r+2r?, p3=34+7r+ 227. 


Exercises 


(a) Find T(Pp1), T(p2), and T(p3). 


(b) Find [T(p1)]z, [T(p2)]B, and [T(ps)]B- 
7.18 Let 
1 3 
—2 5 


be the matrix for T: R? > R? relative to the basis B = {v,,v2}, where 


A= 


vı = [1,3]7, vo = [-1,4]?. 
(a) Find T(v,) and T(v2). 


(b) Find [T(v1)]z, [T(v2)]B, and T(u), where u = [1,1]*. 
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7.19 Let T: P> > P, be the linear transformation defined by T(p(x)) = p(x + 1). 


(a) Find the matrix [T] gg relative to the bases B = {1, x, x7} and B’ = {a, x”, 1}. 


(b) If p=1+4 2x4 32”, find [T(p)]g; by using the matrix |T] g B. 


7.20 Verify that the linear transformations T;: R? > P, and Tz: P) + R? defined 


by 
a 


n( |] -0+ ero nerdy =| 4° | 


are inverses of each other. 


7.21 Let Ti: U > V and To: V > W be one-to-one linear transformations, where 


U, V, and W are vector spaces with the same dimension. Show that (T> o Ti) 


T oT 
7.22 Prove Theorem 7.9. 
7.23 Prove all the properties in Table 7.1. 


7.24 Suppose that 


Find the values of a and b if A is similar to B. 


7.25 Show that if A and B are similar, then AT and BT are similar. 


ae ee 


7.26 Show that if two invertible matrices A and B are similar, then A~! and B7! 


are similar. 
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Challenge exercises 


7.27 For any linear transformations T} and T, from a vector space V into a vector 
space W, the operations of addition and scalar multiplication are defined by 


(Ti+ T2)(x) := Ti (x) + T(x), (kT,)(x) := kT (x), 


where x € V and k is a scalar. Show that the set of all linear transformations from 
V to W with these two operations is a vector space. 


7.28 Let v1, V2,...,Vn be vectors in a vector space V and T: V > W be a linear 
transformation. 


(a) Show that if {T(v1), T(v2),..-,T(Vn)} is linearly independent in W, then 
{vi,V2,...,;Vn} is linearly independent in V. 


(b) Show that the converse of (a) is false, i.e., it is not necessarily true that if 
{vi,V2,---;Vn} is linearly independent in V, then {T (v1), T(v2),...,T(vVn)} 
is linearly independent in W. 


(c) Show that if T is one-to-one, then {v1,V2,...,Vn} is linearly independent in 
V if and only if {T(v1),T(v2),..-,7(Wn)} is linearly independent in W. 


7.29 Determine whether each function T: Py + P is a linear transformation. 


(a) T(ao + aiz + agx?) = ao + ai (£ + 1) + a2(£ + 1)?. 


(b) T(ao + aiz + azz?) = (ao + 1) + (a1 + D(@ +1) + (a2 + 1)(£ + 1}. 
7.30 Let T: P> —> P; be the linear transformation defined by 
T(p(a)) = zp(2). 
Find the bases for the kernel and range of T. 


7.31 Find the kernel, range, rank, and nullity of the linear transformation T: P > 
P> defined by 


7.32 Let T: R? — R? be the linear transformation given by the formula 
T(x, y) = (8a + y, —4x + 3y). 


Find the bases for the kernel and range of T. 
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7.33 Find the kernel and the nullity of the linear transformation T: Pı —> R defined 
by 


7.34 Let V be a finite-dimensional vector space and T: V —> V be a linear 
transformation. Show that 


(a) ker(T) N R(T) = {0} if and only if rank(T) = rank(T o T). 
(b) ker(T) = RI —T) if T=ToT. 


7.35 Let V be the vector space of all symmetric 2 x 2 matrices. Define a linear 
transformation T: V > P> by 


"| 


Find the rank and nullity of T. 


a b 
b c 


) = (a — b) + (b—c)z + (c — a)z?. 


7.36 Let Tı: Po > P; and To: P3 —> P; be the linear transformations given by the 
formulas 

Tı (p(x)) = p(x), T2(p(a)) = p(z + 1). 
Find the formulas for T" (p(£)), Ts ‘(p(z)), and (Tz o T1)! (p(æ)). 


7.37 Let V and W be finite-dimensional vector spaces and dim(W) < dim(V). 
Show that there is no one-to-one linear transformation T: V —> W. 


7.38 Let B = {vi,v2,...,Vn} be a basis for R” and P = [p,;] € R”*” be an 
invertible matrix. Show that if 


Ui = pui V1 H paveat-::-+Pnivn, 1<SKi<n, 


then B’ = {uj,Ue,...,Un} is a basis for R” and P is the transition matrix from B 
to B’. 


7.39 Let T: IR?*? — R?*? be defined by 


a b 2c a+c 
T = 
( c T Ea d 


Find the matrix [T] g relative to the basis B = {A(1), A(2), Ag), A(4) }, where 


1 0 0 1 0 0 0 0 
A = A = A = A = . 
(1) | 0 0 a (2) | 0 0 | 3 (3) | 1 0 2 (4) | 0 1 | 


Chapter 8 
Additional Topics 


“God used beautiful mathematics in creating the world.” 


— Paul Dirac 


“The art of doing mathematics consists in finding that special case which contains all the 
germs of generality.” 
— David Hilbert 


In this chapter, we study several important topics in linear algebra. We introduce 
quadratic forms, complex inner product spaces, and some special structured 
matrices. Finally, we discuss the B6ttcher-Wenzel conjecture. 


8.1 Quadratic Forms 


In this section we study functions in which the terms are squares of variables or 
products of two variables. Such functions arise in a variety of applications, including 
geometry, vibrations of mechanical systems, statistics, and electrical engineering. 


8.1.1 Introduction to quadratic forms 


Up to now, we have been interested primarily in linear equations of the following 
form 
A421 + d9%q + +++ + ann = b. 


The expression on the left-hand side of this equation is a linear form, in which all 
variables occur to the first power. Now, we are concerned with quadratic forms, 
which are functions of the form 


aix? + a222 +--+ anz? + (all possible terms of form 2a,%;%; for i < j). 
For instance, a quadratic form in the variables x; and x2 is 


ar? + azr? + 2032122 (8.1) 
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and a quadratic form in the variables 71, 72, and x3 is 
2 2 2 
a£] + A2%5 + A323 + 20421 x2 + 2457123 + 2agrozr3. (8.2) 


The terms in a quadratic form that involve products of different variables are called 
the cross-product terms. 


Note that (8.1) can be written in matrix form as 


[t1, x2] | f | | a | (8.3) 


and (8.2) can be written as 


a, ag a5 Ly ] 
[£1, £2, £3] | a4 ag ag T2 |. (8.4) 
a5 ag a3 T3 


The products in (8.3) and (8.4) are both of the form xT Ax, where x is the column 
vector of variables, and A is a symmetric matrix whose diagonal entries are the 
coefficients of the squared terms and whose entries off the main diagonal are half 
the coefficients of the cross-product terms. By using the Euclidean inner product, 
we can write the quadratic form as 


x” Ax = x" (Ax) = (Ax, x) = (x, Ax). (8.5) 


There are two important mathematical problems related to quadratic forms. 
(1) Find the maximum and minimum values of xT Ax if x = [21,22,...,2n]" is 
constrained so that 


Ixl = (i + r3 + tan)? = 1, 
(2) What conditions must A satisfy for a quadratic form to satisfy x7 Ax > 0 for 
all x # 0? 
We study the problems above in the next two subsections. 


8.1.2 Constrained extremum problem 


The goal in the subsection is to consider the problem of finding the maximum and 
minimum values of xT Ax subject to ||x|| = 1. By Theorem 6.7 (a), we know that 
all the eigenvalues of a symmetric matrix A are real. Therefore, we can arrange the 
eigenvalues of A in a decreasing size order. 
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Theorem 8.1 Let A be a symmetric n x n matrix whose eigenvalues in decreasing 
size order are 1 > A2 > ++: > An. If x is constrained so that ||x|| = 1 with respect 
to the Euclidean inner product on R”, then 


(a) à > x? Ax > Xn. 


(b) x? Ax = A, if x is an eigenvector of A corresponding to ày and x? Ax = Àn if 


x is an eigenvector of A corresponding to Xn . 


Proof We only prove (a) and the proof of (b) is left as an exercise. Since A is 
symmetric, it follows from Theorem 6.6 that there is an orthonormal basis for R” 
consisting of eigenvectors of A. Suppose that S = {v1,v2,...,Vn} is such a basis, 
where v; is the eigenvector corresponding to the eigenvalue A;. Let (-,-) be the 
Euclidean inner product. It follows from Theorem 5.8 that for any x € R”, 


x = (x, vi)vi + (X, Vo)vo + +++ + (X, Vn) Vn- 


Thus, 


Ax = (X,v1) Av, + (X, V2) Ava +--+ + (X, Vn) AVn 
= (X, vi)Aivi + (X, V2)A2V2 +++ + (X, Vn) AnVn 
= AL 


(X,V1)V1 + A2(X, V2) Vo + °° + An(X,Vn)Vn- 


The coordinate vectors of x and Ax relative to the basis S$ are 


[x]s = [(x, v1), (x, V2), e. (x, Vn)” 
and T 
[Ax]s = [Ai (X, v1), A2(X, V2), -3 An X, Va. 
Thus, from Theorem 5.9 (c) and the fact that ||x|| = 1, we obtain 
x|? = (x, xX) — (x, v1)? + (x, V2)? ale Sis. (X, Vn)? =1 
and 
(x, Ax) = A (x, v1)? + A2(x, V2)? + + An (X, Vn)”. 


Using (8.5) and these two equations, we can prove that xT Ax < à; as follows: 
xT Ax = (x, Ax) = A(x, v1)? + A2(K, V2)? + + An (X, Vn)? 
< Ai(x, v1)? + Ai (x, v2)? ++ MIX, Vn)? 


= (ev)? +(x, v) ++ (x, Vn)?) =). 


Similarly, one can show that xP Ax > Xn. 
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8.1.3 Positive definite matrix 
Definition A symmetric matrix A and the quadratic form x? Ax are called 
(i) positive definite if x’ Ax > 0 for all x £0. 
(ii) positive semidefinite if xT Ax > 0 for all x. 
(iii) negative definite if xT Ax < 0 for all x 40. 
(iv) negative semidefinite if xT Ax < 0 for all x. 
Theorem 8.2 A symmetric matrix A is positive definite if and only if all the 


eigenvalues of A are positive. 


Proof Assume that A is positive definite and A is an eigenvalue of A. Let x be an 
eigenvector of A corresponding to A, i.e., Ax = Ax with x £0. Then 


0 <x? Ax = x? )x = Ax? x = JI|x||?, 


where ||x|| is the Euclidean norm of x. Since ||x||? > 0, we have \ > 0. 


Conversely, assume that all eigenvalues of A are positive. We must show that 
xT Ax > 0 for all x 4 0. However, if x 40, we can normalize x to obtain the vector 
y = x/||x|| with the property ||y|| = 1. It now follows from Theorem 8.1 that 


yT Ay > An > 0, 


where An is the smallest eigenvalue of A. Thus, 


x! Ax > 0, 


which implies 


i.e., A is positive definite. 


Similarly we have the following corollary for positive semidefinite matrices. 


Corollary <A symmetric matrix A is positive semidefinite if and only if all the 


eigenvalues of A are nonnegative. 


Our next objective is to give a criterion that can be used to determine whether 
a symmetric matrix is positive definite without finding its eigenvalues. To do this it 
is helpful to introduce some terminology. If 


ay Q12 `? Gin 
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is a square matrix, then the leading principal submatrices of A are the 
submatrices formed from the first r rows and r columns of A for 1 <r < n. These 
submatrices are 


Q11 Q12 
Aq) = @11; Aa) = ) A(3) = | G21 G22 Q23 |; 
a21 422 
431 432 433 
and 
Q11 Q12 `° Gin 
a21 Q22 "*: Gan 
Am) =A= 

Ani An2 t Ann 


Theorem 8.3 A symmetric matrix A is positive definite if and only if every leading 
principal submatrix of A is positive definite. 


The proof of Theorem 8.3 is left as an exercise. 


A principal submatrix of an n x n matrix A = [aij] is a square submatrix 
obtained by removing certain rows and columns from A. In fact, for any 1 < k <n, 
a k x k principal submatrix of A is given by 


Giri, Qizi? `? Qizi 
Qizi) Qiziz "°° Ainix 
? 
Qipi Qipki2 ''° Qikik 
where i1, i2,..., ik are integers with 1 < i, < ig < --- < ik <n. For instance, 


Q22 Q24 Q28 
a42 Q44 448 
Qg2 Q84 gg 


is a 3 x 3 principal submatrix of an n x n matrix with n > 8. 


Theorem8.4 A symmetric matrix A is positive definite if and only if every principal 
submatrix of A is positive definite. 


The proof of Theorem 8.4 is left as an exercise. 


8.2 Three Theorems for Symmetric Matrices 


We list three important theorems which are concerned with eigenvalues of symmetric 
matrices. 
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Theorem 8.5 (Courant-Fischer’s Minimax Theorem) If A is an nxn symmetric 
matriz whose eigenvalues in decreasing size order are Ai (A) > (A) > > 2 An(A), 
then forl<k<n, 


T 
7 `. x’ Ax O . T 
Ak(A)= max min —,—= max min x Ax 
XCR” OAKEX XİX XCR?” xEX 
dim(¥)=k dim(¥)=k ||x||=1 
' x? Ax 
= min max —p— = min max x T Ax, 
XCR” OAxXEX X*X XCR” 
dim(¥)=n-k+1 dim(¥)=n-k+1 iie =i 


where X denotes a subspace of R” and ||x|| is the Euclidean norm of x in R”. In 


particular, 
à (4) = mox T Ax, An(A) = sie xT Ax. 
Proof Let dim(¥) = k. Suppose that uj,u2,...,U, are the orthonormal eigenvec- 


tors of A corresponding to \1(A), A2(A),...,An(A), respectively. Let Y = span{ug, 
Uk+1,---;Un}. We have 


dim(¥) + dim(Y) =n+ 1. 
Note that by Theorem 4.14, 
dim(¥N V) = dim( X) + dim(V) — dim(¥ + V) >n4+1—-n=1. 


We have for any x € ¥ N Y with ||x|| = 1, 
x=) 5u J lg = 
j=k j=k 


Then 7 
xT Ax = > IEPALA < YO IEPA) = Ak(A). 
j=k 


Hence 
min x TAx < min x7 Ax < (A). 


xEXNY 
kie a [|< []=1 


On the other hand, if we take 
Xo = span{uy, U2,..., Uk}, 
then dim(4>) = k and we obtain 


min x T Ax = uf Aug = uf Ap (A)ug = Ax (A). 
xe Xo 
iIxl=1 
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Thus, 


de(A) = max min x7 Ax. 
XCR? xEx 
dim(X)=k ||x||=1 


Applying the above equality on —A, and noting that 
—Ap(A) = àn-k+1 (4), 1l<k<n 


one can deduce 
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Akl A) = —An—K4i(—A) = - max min x" (—A)x 
dim(X¥)=n—k+1 ||x||=1 
=— max  (-— max x? Ax)= min max x” Ax. 
XCR” xex XCR” xEX 
dim(X)=n—k41  Ilxl|=1 dim(¥)=n-k+1 lx]|=1 


In particular, we have 


à (4) = mx xT Ax, An(A) = ma xT Ax, 
x||=1 x||=1 


which coincide with Theorem 8.1 (b). 


Theorem 8.6 (Cauchy’s Interlace Theorem) Let A be an n x n symmetric matrix 


whose eigenvalues in decreasing size order are 


Ai (A) 2 A2(A) 2 +++ > An(A). 


Let B be any m x m principal submatrix of A whose eigenvalues in decreasing size 


order are 

Hi(B) > u2(B) 2 +: > Um(B). 
Then fori Sj Sm, 

Aj(A) > m(B) 2 Ajtn—m(A). 


Proof We can assume that A is given as the following form 


B C 


A= 
CT D 


In fact, we can always take a similarity transformation on A by permutation matrices 
if necessary. By using Theorem 8.5, there exists a subspace ¥ C R” with dim(¥) = 


j which satisfies 


For any x € R”, we construct X = 


dim(¥) = j. Moreover, 


x? Bx = x" AX. 


ER”. Let ¥ = {X | x€ X} CR”. Then 
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We have by Theorem 8.5 again, 


p(B) = min x TBx = min KT AK < max min y’ Ay = 4,(A). 
iei iia amO) yea 


Applying the above result on —A and —B, and noting that 


and 
= 1;(B) = tm—js1(-B), 1<5<m, (8.7) 
we have by taking i = j +n — m in (8.6) and then followed by using (8.7), 
—Aj4+n—m(A) = An—(j+n—m)+1(—A) = Am=j+1(-A) > Um—j4+i(-B) = —4;(B), 


hë 


Aj+n—m(A) < py; (B). 


Theorem 8.7 (Weyl’s Theorem) Let A and B be n x n symmetric matrices whose 


eigenvalues in decreasing size order are 
Ai(A) > A2(A) > 0 > An(A), AL (B) > A2(B) > +++ > An(B), 


respectively. Let A (A + B), à2(A + B),...,An(A + B) denote the eigenvalues of 
A+ B in decreasing size order as 


AlA +B) > r(A+ B) >--- > An(A+ B). 
Then for alll<j<n, 
max {Ar(A) + As(B)} < Aj(A + B) < ae {\,(A) + A5(B)}. 
r+s=j+ r+s 


Proof We prove the left inequality first. Let r + s = j +n. By Theorem 8.5, there 
exist two subspaces ¥ and Y in R” with dim(4’) = r and dim(Y) = s such that 


Since 
dim(¥ N V) = dim( X) + dim(V) — dim(¥ +V) Z r+s-n= j, 
there exists a subspace Jo C A N Y which satisfies dim(7ọ) = j. Thus, 


Aj(A+ B) = max min x T(A+ B)x > min (xT Ax + x7 Bx) 


R”? xE 
aim(T)= j llxl= Z Ix] =1 
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eee A ss SP bt UT et Se 
> min x Ax + min x° Bx > min x’ Ax + min x Bx 
xETG xET( xEX xE 
Ixl=1 Ixl =1 Ix]=1 Ix] =1 


= àr(A) + A5(B). 
Applying the above inequality on —A — B, and noting that 


Aj(—A— B) = —An-j41 (A+ B), 1<j<n; 
Ar(—A) = —An-r+1(4), 1 are n; 
As(—B) = —An-s+1(B), 1 KEK n, 


we deduce 


—An—j+1(A + B) = Aj(—A — B) > àr(—4A) + As(—B) = —An-r+1(4) — An—s41(B)- 
(8.8) 
Let j =n- j+1,r'=n-r+1, and s =n—s+1. Then (8.8) can be simplified to 


Aj (A + B) < Ar (A) + às (B), 


where 
r'+s' =(n-r+1)+(n-s+1)=(n-j+1)+1=7 +1. 


Thus, the right inequality holds. 


8.3 Complex Inner Product Spaces 


A complete presentation of linear algebra must include complex numbers. We 
therefore review some basic knowledge of complex numbers before we study complex 
inner product spaces. 


8.3.1 Complex numbers 


Definition A complex number z is defined by 
z :=a + bi, 


where a and b are real numbers, and i? = —1. The real numbers a and b are called 


the real and imaginary parts of z, respectively. 


Let z1 = a + bi and z2 = c + di be two complex numbers. Then zı and zg are 
said to be equal if and only if their real parts are equal and their imaginary parts 
are equal, i.e., 


Z1 = 22 a=c and b= d. 
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Also, zı and z2 can be added, subtracted, and multiplied in accordance with the 
standard rules of algebra but with i? = —1. For instance, 


zı tz = (a+ bi) + (c + di) = (a + c) + (b d)i 


and 


zı < z2 = (a + bi) - (c + di) = (ac — bd) + (ad + be)i. 


Definition For a complex number z = a + bi, the complex conjugate of z, denoted 
by the symbol z, is defined by 


zZ:=a- bi. 
The modulus of a complex number z = a+ bi, denoted by |z|, is defined by 
|z| := Va? +b? ; 
The following theorem establishes some essential properties of complex numbers. 


Theorem 8.8 Let z, 21, and z2 be any complex numbers. Then 


(a) 21 — 22 = ZE Z 


(b) 2+ 2 = 2° 22. 


Proof We only prove (a) and (d). The proofs of (b) and (c) are left as an exercise. 
For (a), let 21 = a1 + bii and z2 = ag + boi. Then 


zy 22 = (ay ae bii) E (az + boi) m (ay a a2) + (by me bə)i 


= (ay aE a2) = (by x be)i = (ay T bii) £ (ag a bei) 


= 24 25. 


For (d), let z = a+ bi. Then 


z: Z = (a + bi) - (a + bi) = (a+ bi) - (a — bi) 


= a? — abi + abi — b?i? = a? + b? 


= |2|?. 
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8.3.2 Complex inner product spaces 


In the definition of a general vector space V in Subsection 4.1.1, if the scalars are in 
C := {a+bi|a,bER, i:= v-1}, 


then V is called a complex vector space. The notions of linear combination, linear 
independence, spanning sets, basis, dimension, and subspace carry over without 
change to complex vector spaces. Moreover, the theorems developed in previous 
chapters for real vector spaces continue to hold with real vector spaces changed to 
complex vector spaces. 


Definition An inner product on a complex vector space V is a function that asso- 
ciates a complex number with each pair of vectors u and v in V, denoted by (u,v), 
in such a way that the following axioms are satisfied for all vectors u, v, and w in 


V and all scalars k in C. 
(i) (u,v) = (v,u). 
(ii) (u + v,w) = (u, w) + (v,w). 


(ii 


) 
) 
) (ku, v) = klu, v). 
) 


(v,v) >20; (v,v)=0 ifand only ifv =0 


(iv 


A complex vector space with an inner product is called a complex inner product 
space. 


Remark In a complex inner product space V, the norm of a vector u € V is defined 
by 
lul] = (u, u)". 


The Cauchy-Schwarz inequality is also available for complex inner product spaces. 
Moreover, the definitions of orthogonal set, orthonormal set, orthogonal basis, and 
orthonormal basis carry over to complex inner product spaces without change. The 
Gram-Schmidt process can be used to convert an arbitrary basis into an orthogonal 
(or orthonormal) basis for a complex inner product space. 


Example Let ©” := {(ci,c2,..-,¢n) | c E€ C} with the operations of vector 
addition and scalar multiplication. For vectors u = (u1, U2,..., Un) and v = 
(v1, V2,---,Un) in C”, the complex Euclidean inner product (u, v} is defined by 


n 
= S UiUi, 
i=1 
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which satisfies the four axioms of the inner product. A complex vector space C” 


with this inner product is call the complex Euclidean space. We can define the norm 
and distance as follows: 


lul] = (u, u)*? = 


Let C™*” denote the vector space of all m x n complex matrices with the 
operations of matrix addition and scalar multiplication. In fact, almost all the 
concepts concerned with the matrix operations can be generalized from real matrices 
to complex matrices straightforwardly. Let A = [a;;] E€ C™*", i.e., ajj € C for any 
i and j. Then the matrix defined by 


A* = AT = [a;i] 


is call the conjugate transpose of A. We have the following theorem concerned 
with some basic properties of A*. The proof of the theorem is trivial and we therefore 
omit it. 


Theorem 8.9 Let A and B be complex matrices and k be any complex number. 
Then 


(a) (A*)* =A. (b) (A+B)* = A*+B*. (c) (kA)* =kA*. (d) (AB)* = B*A*. 


Example An inner product on C”*” is defined by 
(X,Y) :=tr(XY%), 


where X,Y € C”*". One can check easily that (X,Y) satisfies the four axioms of 
the inner product. The Frobenius norm for any X = [x;;| € C”*” is defined as 


EISE (X, X)? = [tr(X X*)]/2 = (EY je) 


i=1 j=l 
8.4 Hermitian Matrices and Unitary Matrices 


We study Hermitian matrices and unitary matrices in this section. 


Definition Jf a square matrix A with complex entries satisfies A= A*, then A is 
called a Hermitian matriz. If a square matrix A with complex entries satisfies 
A-l=A*, ie., 

A*A = AA* =T, 


then A is called a unitary matriz. 
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Theorem 8.10 Let A be annxn complex matriz. Then the following are equivalent. 
(a) A is unitary. 


(b) The row (or column) vectors of A form an orthonormal set in C” with respect 
to the Euclidean inner product. 


The proof of Theorem 8.10 is similar to that of Theorem 5.18 and is left as an 
exercise. 


We note that our earlier definitions of eigenvalue, eigenvector, eigenspace, 
characteristic equation, and characteristic polynomial carry over without change 
to complex matrices. 


For a square matrix A with complex entries, if there exists a unitary matrix P 
such that 
P*AP=D, 


where D is a diagonal matrix, then A is called unitarily diagonalizable. 
Theorem 8.11 If A is a Hermitian matriz, then A is unitarily diagonalizable. 
The proof of Theorem 8.11 is left as an exercise. 
Theorem 8.12 Let A be Hermitian. Then 

(a) The eigenvalues of A are all real. 

(b) Eigenvectors from different eigenspaces are orthogonal. 


Proof For (a), let » be an eigenvalue of a Hermitian matrix A and v be the 


corresponding eigenvector. Then 
Av = Xv. 
Multiplying both sides by v* yields 
v* Av = Av*v, 
and then 


v* Av 


Ivl? ” 


Therefore, 


= viAv\* vtAtv vtAv 
Ivl? Ivl? Ivl? 
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For (b), let vı and v2 be eigenvectors corresponding to distinct eigenvalues à; and 
Az of A. Then we have by (a), 


Al (V1, V2) = (Avi, V2) = v3 Avi = (A*v2)*vı = (Av2)*vı = Aovžvı = A2(V1, V2), 


which implies 
(Ài i A2) (V1, V2) =0. 


Since A; — Az Æ 0, we have 


(vı, V2) = 0. 


Remark If A is a real symmetric matrix, then A is also Hermitian. Therefore, the 
results in Theorem 8.12 hold for all real symmetric matrices. See Theorem 6.7. 


4 1-i 
1+i 5 


is unitarily diagonalizable because it is Hermitian. Find a matrix P that unitarily 


Example The matrix 


diagonalizes A. 
Solution The characteristic equation of A is 


A-4 —-1+i 


= à? — 9A + 18 = (A— 3)(A—6) = 0 
-1-i A-5 i ( M ) 


det(AI — A) = det | 


and the eigenvalues are A = 3 and à = 6. The corresponding eigenvectors are given 
as follows: 


1-i 


—]l +i 


A= 3, Vi = 1 


1 


Since each eigenspace has only one basis vector, we have (v1, v2) = 0 by Theorem 
8.12. Normalizing these basis vectors yields 


—1+i 1-i 

vi J3 V2 V6 
Pi=7 7 = , P= T= 

Ilvill i Ilv2l| 2 

V3 V6 


Thus, A is unitarily diagonalized by the matrix 
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3 0 

0 6° 
Theorem 8.13 Let ài, à2,..., An be eigenvalues of ann x n Hermitian matrix A. 
Then 


It is easy to verify 


P*AP= 


|| Al] = A? + AB ++. +22. 


Proof It follows from Theorem 8.11 and Theorem 8.12 (a) that there exists a unitary 
matrix P such that 
P*AP =D, 


where D = diag(\j, à2, . . . , An) with A, E R (1 < k < n). Hence 
|| All?) = | PDP*||% = tr(PDP*)(PDP*)*] = tr(PDD* P*) 
= tr(DD* P* P) = tr(DD*) = ||D||? 
=A HAZ H+ +22. 


Here we used the property of tr(VW) = tr(WV) for all V, W e C”*”. 


Finally, for certain Hermitian matrices, we introduce the following definition. 
Definition A Hermitian matrix A is called 
(i) positive definite if x* Ax > 0 for all x £0. 
(ii) positive semidefinite if x* Ax > 0 for all x. 
(iii) negative definite if x* Ax < 0 for all x £0. 
(iv) negative semidefinite if x* Ax < 0 for all x. 


Theorem 8.14 A Hermitian matrix A is positive definite (or semidefinite) if and 
only if all the eigenvalues of A are positive (or nonnegative). 


The proof of the theorem is similar to that of Theorem 8.2 and is left as an exercise. 


Remark The results of Courant-Fischer’s Minimax Theorem, Cauchy’s Interlace 
Theorem, and Weyl’s Theorem in Section 8.2 also hold for Hermitian matrices. 


8.5 Bottcher-Wenzel Conjecture 


In the final section of the book, we study the Böttcher- Wenzel conjecture. 
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8.5.1 Introduction 


A fundamental fact in matrix theory is that the matrix product is not commutative, 
i.e., there are n x n matrices X and Y such that 


XY YX. 


See Example 2 in Subsection 1.3.1. The difference XY — YX is called the 
commutator or Lie product of X and Y. The commutator plays an important role 
in diverse areas in mathematics, for instance, Lie algebra and Lie group theory [3] and 
matrix computation [12]. Böttcher and Wenzel [5] proposed the following conjecture 
in 2005: the upper bound of the Frobenius norm of the commutator of all n x n 
matrices X and Y is given by 


IXY —Y Xp < V2 ||X|ell¥ |F- 
Note that the constant v2 is best possible as shown by a simple example 
xe 0 1 , y- 0 0 ! 
0 0 1 0 


The conjecture was first proved for all n x n real matrices in 2008 by Vong and 


Jin [24]. Later, the result had been generalized to complex matrices [2, 6,9]. The 
result is important and fundamental. This can be reflected by the fact that the 
result is immediately included in the encyclopedic book [4]. 


8.5.2 Proof of the Böttcher-Wenzel conjecture 


As defined in Subsection 8.3.2, the Frobenius norm is given by 


|| All = (A, A) Ay = tr(AA*). 


t=1 9=1 
In order to prove the B6éttcher-Wenzel conjecture, we need the following lemmas. 
n 
Lemma 8.1 Let p; >0 forl <j <n with > p; =1 and qj be real numbers for 


j=l 
1l<j<n. Then 


n 


2 
Ses (Sa): <52. 
j=1 j 


j=1 


Proof From direct calculations, we have 


Loë = (Sopa) = Sp; [ai = (Fra). 
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A 


Assuming that qı > q2 > +- > qn and denoting d = 3 (qi +n) — -Low we 


deduce 


Ld- (Era) Drle - (Sema) +d 


st fu- (Sma) = aY» (u = > aa) +d 


j=l 
n n 2 
-Dniu - (Zra) -4 
j=l k=1 
Z (dj-a a-a 
aSa J 5 5 L) 
j=1 
q q 2 n IA 
17 qn 2 | 2 aF 
< dn 2 ) < F(2a7 + 2an) 2 oe 
j= j= 


Lemma 8.2 Let A and B be Hermitian matrices. Then the trace of AB is real. 
Proof We have 
tr(AB) = tr(AB) = tr(AB) = tr((AB)") 
tr(B? AT) = tr(B* A*) = tr(BA) 
= tr( AB). 


Thus, the trace of AB is real. 


Lemma 8.3 (Cartesian Decomposition [8]) Let M be any square matrix with 
complex entries. Then M can be decomposed as 


M = A+iB, 
where A and B are Hermitian matrices and i = y—1. 


Proof Let 


Then A and B are Hermitian and M = A + iB. 


We now state the Böttcher-Wenzel conjecture as the following theorem. The idea 
of the following proof is elementary. 
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Theorem 8.15 For any n xn complex matrices X and Y, we have 
IXY —YX |p < XPV 7- (8.9) 


Proof If X =0, then (8.9) holds obviously. Now suppose X 4 0 and then ||X|| 7 > 
0. In the following, we repeatedly use the property of 


tr(VW) = tr(WV) 
for all V, W € C"*". We deduce 


|| XY —YX||2 = tr(XY —-YX)(XY -YX)*] 
= tr(XYY*X* — XY X*Y* —YXY*X*+YXX*Y*) 
= tr(X*XYY* — XY X*Y* —-YXY*X* + XX*Y*Y) 


and 
IXY + YX*||2 = tr(X*Y + VX*)(X*Y +Y X*)] 
= tr(X*YY*X + X*YXY* +YX*Y*X +YX*XY“) 
= tr(XX*YY* + YXY*X* + XY X*Y* + X*XY*Y). 
Thus, 


IXY =- YX||b + IXY +Y X*||% 
= tr(X*XYY* + XX*Y*Y + XX*YY* + X*XY*Y) 
= tr[(X* X + XX*)(Y*Y +YY*)]. (8.10) 


By using the Cauchy-Schwarz inequality, we obtain 


ltr[Y(X*X + XX*)]] = |tr(X*Y + Y X*)X]| = |(X*Y +Y X*, X*)| 
SX" el AXCY + YX" |e = [Xe Xx TY + YX" |p. 


Consequently, 
IXY + YX" > |tr[¥(X"X + XX*)]/?/||XI[F- (8.11) 
Combining (8.10) and (8.11) then gives 


IXY -Y X|} < tr[(X*X + XX*)\(Y*Y + VY") — |tr[V (X*X + XXP / |X| 2. 
(8.12) 
Let 
D = (X*X + XX*)/(2||X||p). 
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We can simplify (8.12) by using D as follows: 
IXY —YX||% < 4l X|} [tr[D(V*Y + YY*)/2] — |tr(DY)|7] . (8.13) 


Note that D is positive semidefinite with 
tr(D) = tr[(X*X + XXNX] = XX = |X B/X le = 1 


Now, it remains to show that the right-hand side of (8.13) satisfies the following 


inequality: : 
Y 
tr[D(Y*Y + YY*)/2] — |tr(DY)|? < I E 


Following Lemma 8.3, we suppose that 
Y = A +iB, 
where A, B are Hermitian and i = v—I. Obviously, 
TY +YY*) = A? + B?, 
and then 
IYI} = tr(YY*) = tr(A? + B?) = tr(AA*) + tr(BB*) = || All + ||Bllz- (8-14) 


Using Lemma 8.2 that the trace of the product of two Hermitian matrices is a real 
number, we therefore have 


|tr(DY)|? = |tr(DA) + itr(DB)|? = [tr(DA)]? + [tr(DB)}’. 


Hence 


tr[D(Y*Y + YY*) /2] — |tr(DY)|? 
= tr[D(A? + B?)] — [tr(D.A)]? — [tr(DB)}? 


= (t(D?) > (tr(DA)}) $ (t(DB?) = [(DB)}). (8.15) 
It follows from (8.13) and (8.15) that 
IXY —YX|[2 < 4X2 [(te(D4?) = itr(DA)}?) fy (tr(DB?) Pe (tr(DB)}?) | 
(8.16) 


Next, we want to show that for any Hermitian matrix H € C”*”, 


EH 


tr(DH”) — [tr(DH)}? < 5 
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By Theorems 8.11 and 8.12, we have 
H = UAU*, 


where U is unitary and A = diag(à1, À2,..., An) with real numbers A1, 2,...,An- 
Let 


Then P is also a positive semidefinite matrix with 
tr(P) = tr(U* DU) = tr(D) = 1. 


Thus, for every column vector e; of the n x n identity matrix, it follows from the 
definition of positive semidefinite matrices that for 1 <j <n, 


Pig = e; Pe; > 0. 
Since p;; > 0 and > pj; = tr(P) = 1, we have by Lemma 8.1 and Theorem 8.13, 
j= 
tr(DH7?) — [tr(DH)|? = tr(PA?) — [ 
= pir} - (Eoy) < D a = a 
j=l j=l 


Then (8.17) holds. Applying (8.17) and (8.14) to (8.16), we finally obtain 


E 

Y 
G 
= 


IXY - YX? < 4X1 = 2||X|BIY I>. 


2 |All + IBI? 
x 2 


Exercises 


Elementary exercises 


8.1 Express the following quadratic forms in the matrix notation xT Ax, where A 


is a symmetric matrix. 
(a) £1£2 + 4143 + Lars. (b) 5a? + 5x122. (c) Ag? _ 9x3 — 62129. 


8.2 Determine which of the following matrices are positive definite. 


011 1 2 1 
Gy 2-1]. lioil. @ |211 
1 1 0 1 1 3 
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8.3 Find the maximum and minimum values of each given quadratic form subject 
to the constraint x? + 73 = 1. Then determine values of zı and x2 at which the 


maximum and minimum occur. 
(a) f(a1,22) = £? + 4z1£2 + 23. (b) f(x1, £2) = 5a? + 4022 + 222. 
8.4 Prove Theorem 8.1 (b). 
8.5 Determine which of the following quadratic forms are positive definite. 
(a) 90x? + 13023 + 71x} — 122,22 + 482123 — 602273. 


(b) —5x2 — 6x23 — 402 + 4z1£2 + 4a 23. 


8.6 Ineach part, find all values of k for which the quadratic form is positive definite. 
(a) z? + kad — 4a 2. 


(b) 2x? + (2 + k)a3 + kx? + 2£1£2 — 2x73 + £23. 


8.7 Show that if A,B € R"*” are positive semidefinite and a,8 € R are 
nonnegative, then aA + 8B is positive semidefinite. 


8.8 Let x7 Ax be a quadratic form and T: R” > R defined by T(x) = x7 Ax. 
(a) Show that T(x +y) = T(x) + 2x7 Ay + T(y). 
(b) Show that T(kx) = k?T(x), where k is a scalar. 

8.9 Prove Theorem 8.3. 

8.10 Prove Theorem 8.4. 

8.11 Prove Theorem 8.8 (b) and (c). 

8.12 In each part, find real numbers a and £ that satisfy the following equation. 

(a) ai+ (1 +i) = 3 + 6i. (b) a(2 + 3i) + B(1 — 4i) = —1 + 4i. 


8.13 Let u = [1,0, —i], v = [1 +i, 1,1 — 2i], and w = [0,i,2]. Express the following 
vectors as linear combinations of u, v, and w. 


8.14 Which of the following sets of vectors in C? are linearly independent? 


(a) uy = [1 — i, 1, 0], ug = [2,1 +i, 0], u3 = [1 + i,i, 0]. 
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(b) u1 = [1,0, —i], uz = [1 + i, 1,1 — 2i], us = [0, i, 2]. 
(c) Ups li, 0,2 — i], u2 = [0, 1, i], u3 = [-i,—1 — 4i, 3]. 


8.15 If u = [u, u2], v = [v1, v2] € C?, determine which of the following functions 
f are inner products. 


(a) f(u, v) = 3u101 + 2u202. 
(b) f(u, v) = u10 + (1 + ijui + (1 — i)uz201 + 3u202. 
8.16 Find ||x|| using the Euclidean inner product on C?. 
(a) x = [l,i]. (b) x = [1 —i,1 + i]. (c) x = [-i, 3i]. 


8.17 Show that the vectors u; = [i,i, i], ue = [—2i,i,i], and us = [0, —i,i] form an 
orthogonal basis for ©? with the Euclidean inner product. By normalizing each of 
these vectors, find an orthonormal set. 


8.18 Show that if u and v are vectors in a complex inner product space, then 


1 1 : F 
(u,v) = Flu +v? — zlu - vl? + {la + ivl? — $ fu — ivl. 

8.19 Show that if {w1,w2,...,w,} is an orthonormal basis for a complex inner 
product space V, then for any vectors u and v in V, 

(u,v) = (u, w1) (Vv, W1) + (U, W2) (Vv, W2) + ++ + (U, Wn) (V, Wn). 
8.20 Let A € C”*”. Show that A = 0 if and only if x* Ax = 0 for any x € C”. 
8.21 Prove Theorem 8.10. 
8.22 In each part, find a unitary matrix P that diagonalizes A, and find P* AP. 


2 —-i 
i 2 


6 2+ 2i 
2— 2i 4 


| (b) A= 


8.23 Let A and B be n x n Hermitian matrices. 
(a) Show that A+ B is a Hermitian matrix. 
(b) Show that AB is a Hermitian matrix if and only if AB = BA. 


8.24 In each part, verify that the matrix is unitary and find its inverse. 
3 4i 1 v2 v2 


b 3 


1 
GO) g i 2 
sä 3i sg IF 


8.25 Prove Theorem 8.14. 
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Challenge exercises 
8.26 Show that if A € R”*” is symmetric and A? = 0, then A= 0. 


8.27 Let 


2x? + y? — Avy — 4yz 
x2 +4 y? +4 22 


f(z,y,2) = , @ +y +z? £0. 


Find the maximum and minimum values of the function f(x,y,z), and determine 


values of x, y, and z at which the maximum and minimum occur. 


8.28 Let x = |z1, x2]? and y = [y1, ya]. Find an orthogonal matrix Q such that 
the change of variable x = Qy transforms the quadratic form 


f(£1, £2) = 2x7 — 4x4 £2 + 523 
into a new quadratic form in the variables yı and y2 with no cross-product terms. 
8.29 Let A be a symmetric matrix such that 
A? — 44? + 5A =21, 
where J is the identity matrix. Show that A is symmetric positive definite. 


8.30 Let A be a symmetric positive definite matrix. Show that there exists a 
symmetric positive definite matrix B such that A = B?. 


8.31 Let A € R”"*” and 
B = AI + ATA, 


where A > 0 and J is the identity matrix. Show that B is symmetric positive definite. 


8.32 Let A and B be symmetric positive semidefinite matrices of the same size. 
Show that tr( AB) > 0. 


8.33 Let A = [aij] and B = [b;;] be symmetric positive definite matrices of the 
same size. Show that C = [a,;b,;] is a symmetric positive definite matrix. 


8.34 Let A= [a,;] be an n x n symmetric positive semidefinite matrix. Show that 
(a) ay 20 forl<i<n. 
(b) If aj; = 0, then the ith row and ith column of A consist entirely of 0. 

8.35 Prove Theorem 8.11. 


8.36 Let B € C”*” be invertible. Show that A = B*B is Hermitian positive 
definite. 
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8.37 Let A be an n x n Hermitian matrix with eigenvalues A1,A2,...,An- Show 
that 
(A= A T)(A— Aol) ---(A- Ant) = 0,7 


where I is the identity matrix. 


8.38 Let A and B be Hermitian matrices of the same size. Show that if AB is 
Hermitian, then every eigenvalue \ of AB can be written as À = uv, where p is an 
eigenvalue of A and v is an eigenvalue of B. 


8.39 Let A € ©”*" and x € C”. Show that (A*A)x = 0 if and only if Ax = 0. 


8.40 Let A € ©”*”. Show that tr( A*A) = 0 if and only if A = 0. 


Appendix A 


Independence of Axioms 


An axiom is independent if it can not be proved by using other axioms. To reach 
the conclusion of a reduced set of axioms, independence is desired. In this appendix, 
we study the independence of the axioms of vector spaces. For convenience, we copy 
the definition in Subsection 4.1.1 to here. 


Definition Let V be a nonempty set of objects on which two operations are defined, 
addition and scalar multiplication. It requires that V is closed under the addition 
and scalar multiplication, i.e., for each pair of objects u and v in V, u +v is in V; 
for each scalar k and each object u in V, ku is in V. Then V is called a vector 
space and the objects in V are said to be vectors if the following eight axioms are 
satisfied for all u, v, and w in V. 


(Gj) u+v=v+u. 
(ii) u + (v +w) = (u +v)+ w. 


(ii) There is an object O in V, called a zero vector for V, such that for allu in V, 
u+0=u. 


(iv) For each u in V, there is an object —u in V, called a negative of u, such that 
u+ (—u)=0. 


(v) k(u + v) = ku + kv. 
(vi) (k + l)u = ku + lu. 
(vii) k(lu) = (kl)u. 
(viii) 1u = u. 
Here k and l are scalars. 


Actually, Axiom (i) is not independent because it can be deduced by the other 
axioms [13,20]. We next use other axioms to prove Axiom (i). 
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Lemma A.1 We have (—u) + u = 0 for each vector u in V. 


Proof For any u € V, we have —u € V by Axiom (iv). Then it follows from Axiom 
(iv) again that —(—u) € V. Thus, 


(—u) +u=(-u)+u+0 Axiom (iii) 
= (~u) + u + [(~u) + [-(—u)]] Axiom (iv) 
= (—u) + fu + (~u)] + [-(-u) [Axiom (ii) 
= (—u) + 0 + [-(—u)] Axiom (iv) 
=(<n) 4 (Een) Axiom (iii) 
= 0. Axiom (iv) 


Lemma A.2 We have O + u = u for each vector u in V. 


Proof We have 


0+u=u+(-u)+u [Axiom (iv) 
= u + |(—u) + u] [Axiom (ii) 
=u+0 [Lemma A.1 
=u. [Axiom (iii) 


Theorem A.1 For all vectors u and v in V, we have 
u+v=v+u. 


Proof We deduce 


u+v=0+u+v+0 [Lemma A.2, Axiom (iii) 
= |(—u) + ul] +u +v + [v + (-v)] [Lemma A.1, Axiom (iv) 
= (—u) + (u+ u +v +v) +(-v) [Axiom (ii) 
= (—u) + (1u + lu+ lv + lv) + (—v) [Axiom (viii) 
= (—u) + (2u + 2v) + (—v) [Axioms (ii), (vi) 
= (—u) + 2(u+ v) + (—v) [Axiom (v) 
= (-u) + [(u + v) + (u + v)] + (=v) [Axiom (vi) 
= |(—u) + u] + v + u + [v + (—v)] [Axiom (ii) 
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=0+v+u+0 [Lemma A.1, Axiom (iv) 


=v+u. [Lemma A.2, Axiom (iii) 
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